Re: SSTableloader questions
> > Can the sstableloader job run from outside a Cassandra node? or it has to > be run from inside Cassandra node. > Yes, I'm a fan of running sstableloader on a server that is not one of the nodes in the cluster. You can maximise the throughput by running multiple instances of sstableloader loading SSTables from separate sources/filesystems. My suspicion is that the failed connection to the nodes is due to the SSL options so check that you've specified the truststore/keystore correctly. Cheers! >
Re: SSTableloader questions
Hello Erick, I have one more question. Can the sstableloader job run from outside a Cassandra node? or it has to be run from inside Cassandra node. When I tried it from the cassandra node it worked but when I try to run it from outside the cassandra cluster(a standalone machine which doesn't have any Cassandra process running) using the below command it fails with streaming error. *Command:* > $ /root/apache-cassandra-3.11.6/bin/sstableloader -d ip1,ip2,ip3 > keyspace1/table1 --truststore truststore.p12 --truststore-password > cassandra --keystore-password cassandra --keystore keystore.p12 -v -u user > -pw password --ssl-storage-port 7001 -prtcl TLS *Errors:* > ERROR 21:48:22,078 [Stream #be7a0de0-2530-11eb-bc56-c7c5c59d560b] > Streaming error occurred on session with peer 10.66.129.194 > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_272] > at sun.nio.ch.Net.connect(Net.java:482) ~[na:1.8.0_272] > at sun.nio.ch.Net.connect(Net.java:474) ~[na:1.8.0_272] > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647) > ~[na:1.8.0_272] > at java.nio.channels.SocketChannel.open(SocketChannel.java:189) > ~[na:1.8.0_272] > at > org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:283) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:270) > ~[apache-cassandra-3.11.6.jar:3.11.6] > at > org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:269) > [apache-cassandra-3.11.6.jar:3.11.6] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [na:1.8.0_272] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_272] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) > [apache-cassandra-3.11.6.jar:3.11.6] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272] > progress: total: 100% 0.000KiB/s (avg: 0.000KiB/s) On Mon, Nov 9, 2020 at 3:08 PM Jai Bheemsen Rao Dhanwada < jaibheem...@gmail.com> wrote: > Thanks Erick, I will go through the posts and get back if I have any > questions. > > On Mon, Nov 9, 2020 at 1:58 PM Erick Ramirez > wrote: > >> A few months ago, I was asked a similar question so I wrote instructions >> for this. It depends on whether the clusters are identical or not. The >> posts define what "identical" means. >> >> If the source and target cluster are identical in configuration, follow >> the procedure here -- https://community.datastax.com/questions/4534/. >> >> If the source and target cluster have different configurations, follow >> the procedure here -- https://community.datastax.com/questions/4477/. >> Cheers! >> >
Re: SSTableloader questions
Thanks Erick, I will go through the posts and get back if I have any questions. On Mon, Nov 9, 2020 at 1:58 PM Erick Ramirez wrote: > A few months ago, I was asked a similar question so I wrote instructions > for this. It depends on whether the clusters are identical or not. The > posts define what "identical" means. > > If the source and target cluster are identical in configuration, follow > the procedure here -- https://community.datastax.com/questions/4534/. > > If the source and target cluster have different configurations, follow the > procedure here -- https://community.datastax.com/questions/4477/. Cheers! >
Re: SSTableloader questions
A few months ago, I was asked a similar question so I wrote instructions for this. It depends on whether the clusters are identical or not. The posts define what "identical" means. If the source and target cluster are identical in configuration, follow the procedure here -- https://community.datastax.com/questions/4534/. If the source and target cluster have different configurations, follow the procedure here -- https://community.datastax.com/questions/4477/. Cheers!
SSTableloader questions
Hello, I have few questions regarding restoring the data from snapshots using sstableloader. If i have a 6 node cassandra cluster with VNODEs(256) and I have taken snapshot of all 6 nodes and if I have to restore to another cluster 1. Does the target cluster have to be of the same size? 2. If 1 is true, does SSTableloader have to use each snapshot from the source cluster and map to the target nodes? source1 -> target1 source2 -> target2 source3 -> target3 source4 -> target4 source5 -> target5 source6 -> target6 3. if 1 is false, do I need to run sstableloader for all the 6 snapshots from source in 3 nodes in target? 4. Can I have a different schema(only keyspace name) between source and target clusters? eg: keyspace1 in source cluster but keyspace2 in target Thanks in advance.
Re: sstableloader - warning vs. failure?
Ok, thanks very much the answer! On Fri, Feb 7, 2020 at 9:00 PM Erick Ramirez wrote: > INFO [pool-1-thread-4] 2020-02-08 01:35:37,946 NoSpamLogger.java:91 - >> Maximum memory usage reached (536870912), cannot allocate chunk of 1048576 >> > > The message gets logged when SSTables are being cached and the cache fills > up faster than objects are evicted from it. Note that the message is logged > at INFO level (instead of WARN or ERROR) because there is no detrimental > effect but there will be a performance hit in the form of read latency. > When space becomes available, it will just continue on to cache the next > 64k chunk of the sstable. > > FWIW The default cache size (file_cache_size_in_mb in cassandra.yaml) is > 512 MB (max memory of 536870912 in the log entry). Cheers! >
Re: sstableloader - warning vs. failure?
> > INFO [pool-1-thread-4] 2020-02-08 01:35:37,946 NoSpamLogger.java:91 - > Maximum memory usage reached (536870912), cannot allocate chunk of 1048576 > The message gets logged when SSTables are being cached and the cache fills up faster than objects are evicted from it. Note that the message is logged at INFO level (instead of WARN or ERROR) because there is no detrimental effect but there will be a performance hit in the form of read latency. When space becomes available, it will just continue on to cache the next 64k chunk of the sstable. FWIW The default cache size (file_cache_size_in_mb in cassandra.yaml) is 512 MB (max memory of 536870912 in the log entry). Cheers!
sstableloader - warning vs. failure?
Hi folks, When sstableloader hits a very large sstable cassandra may end up logging a message like this: INFO [pool-1-thread-4] 2020-02-08 01:35:37,946 NoSpamLogger.java:91 - Maximum memory usage reached (536870912), cannot allocate chunk of 1048576 The loading process doesn't abort, and the sstableloader stdout logging appears to end up reporting success, e.g., with a few 100% totals across the nodes reported: progress: [/10.0.1.116]0:11/11 100% [/10.0.1.248]0:11/11 100% [/10.0.1.93]0:11/11 100% total: 100% 0.000KiB/s (avg: 36.156MiB/s) progress: [/10.0.1.116]0:11/11 100% [/10.0.1.248]0:11/11 100% [/10.0.1.93]0:11/11 100% total: 100% 0.000KiB/s (avg: 34.914MiB/s) progress: [/10.0.1.116]0:11/11 100% [/10.0.1.248]0:11/11 100% [/10.0.1.93]0:11/11 100% total: 100% 0.000KiB/s (avg: 33.794MiB/s) Summary statistics: Connections per host: 1 Total files transferred : 33 Total bytes transferred : 116.027GiB Total duration : 3515748 ms Average transfer rate : 33.794MiB/s Peak transfer rate : 53.130MiB/s In these situations is sstableloader hitting the memory issue and then retrying a few times until it succeeds? Or is it silently dropping data on the floor? I'd assume the former, but thought it'd be good to ask you folks to be sure... Jim
Re: sstableloader: How much does it actually need?
Just mulling this based on some code and log digging I was doing while trying to have Reaper stay on top of our cluster. I think maybe the caveat here relates to eventual consistency. C* doesn’t do state changes as distributed transactions. The assumption here is that RF=3 is implying that at any given instant in real time, either the data is visible nowhere, or it is visible in 3 places. That’s a conceptual simplification but not a real time invariant when you don’t have a transactional horizon to perfectly determine visibility of data. When you have C* usage antipatterns like a client that is determined to read back data that it just wrote, as though there was a session context that somehow provided repeatable read guarantees, under the covers in the logs you can see C* fighting to do on-the-fly repairs to push through the requested level of consistency before responding to the query. Which means, for some period of time, that achieving consistency was still work in flight. I’ve also read about some boundary screw cases like drift in time resolution between servers creating the opportunity for stale data, and repairs I think would fix that. I haven’t tested the scenario though, so I’m not sure how real the situation is. Bottom line though, minus repairs, I think having all the nodes is getting you all your chances to repair the problems. And if the data is mutating as you are grabbing it, the entire frontier of changes is ‘minus repairs’. Since tokens are distributed somewhat randomly, you don’t know where you need to make up the differences after. That’s about as far as my navel gazing goes on that. From: manish khandelwal Reply-To: "user@cassandra.apache.org" Date: Friday, February 7, 2020 at 12:22 AM To: "user@cassandra.apache.org" Subject: Re: sstableloader: How much does it actually need? Message from External Sender Yes you will have all the data in two nodes provided there is no mutation drop at node level or data is repaired For example if you data A,B,C and D. with RF=3 and 4 nodes (node1, node2, node3 and node4) Data A is in node1, node2 and node3 Data B is in node2, node3, and node4 Data C is in node3, node4 and node1 Data D is in node4, node1 and node2 With this configuration, any two nodes combined will give all the data. Regards Manish On Fri, Feb 7, 2020 at 12:53 AM Voytek Jarnot mailto:voytek.jar...@gmail.com>> wrote: Been thinking about it, and I can't really see how with 4 nodes and RF=3, any 2 nodes would *not* have all the data; but am more than willing to learn. On the other thing: that's an attractive option, but in our case, the target cluster will likely come into use before the source-cluster data is available to load. Seemed to me the safest approach was sstableloader. Thanks On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez mailto:flightc...@gmail.com>> wrote: Unfortunately, there isn't a guarantee that 2 nodes alone will have the full copy of data. I'd rather not say "it depends". TIP: If the nodes in the target cluster have identical tokens allocated, you can just do a straight copy of the sstables node-for-node then do nodetool refresh. If the target cluster is already built and you can't assign the same tokens then sstableloader is your only option. Cheers! P.S. No need to apologise for asking questions. That's what we're all here for. Just keep them coming.
Re: sstableloader: How much does it actually need?
Yes you will have all the data in two nodes provided there is no mutation drop at node level or data is repaired For example if you data A,B,C and D. with RF=3 and 4 nodes (node1, node2, node3 and node4) Data A is in node1, node2 and node3 Data B is in node2, node3, and node4 Data C is in node3, node4 and node1 Data D is in node4, node1 and node2 With this configuration, any *two nodes combined* will give all the data. Regards Manish On Fri, Feb 7, 2020 at 12:53 AM Voytek Jarnot wrote: > Been thinking about it, and I can't really see how with 4 nodes and RF=3, > any 2 nodes would *not* have all the data; but am more than willing to > learn. > > On the other thing: that's an attractive option, but in our case, the > target cluster will likely come into use before the source-cluster data is > available to load. Seemed to me the safest approach was sstableloader. > > Thanks > > On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez wrote: > >> Unfortunately, there isn't a guarantee that 2 nodes alone will have the >> full copy of data. I'd rather not say "it depends". >> >> TIP: If the nodes in the target cluster have identical tokens allocated, >> you can just do a straight copy of the sstables node-for-node then do >> nodetool >> refresh. If the target cluster is already built and you can't assign the >> same tokens then sstableloader is your only option. Cheers! >> >> P.S. No need to apologise for asking questions. That's what we're all >> here for. Just keep them coming. >> >>>
Re: sstableloader: How much does it actually need?
Been thinking about it, and I can't really see how with 4 nodes and RF=3, any 2 nodes would *not* have all the data; but am more than willing to learn. On the other thing: that's an attractive option, but in our case, the target cluster will likely come into use before the source-cluster data is available to load. Seemed to me the safest approach was sstableloader. Thanks On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez wrote: > Unfortunately, there isn't a guarantee that 2 nodes alone will have the > full copy of data. I'd rather not say "it depends". > > TIP: If the nodes in the target cluster have identical tokens allocated, > you can just do a straight copy of the sstables node-for-node then do nodetool > refresh. If the target cluster is already built and you can't assign the > same tokens then sstableloader is your only option. Cheers! > > P.S. No need to apologise for asking questions. That's what we're all here > for. Just keep them coming. > >>
Re: sstableloader: How much does it actually need?
> > Another option is the DSE-bulk loader but it will require to convert to > csv/json (good option if you don't like to play with sstableloader and deal > to get all the sstables from all the nodes) > https://docs.datastax.com/en/dsbulk/doc/index.html > Thanks, Sergio. The DataStax Bulk Loader was developed for a completely different use case. It doesn't really make sense to go through trouble of converting the SSTables to CSV/JSON when you've already got the SSTables to begin with. ☺ It was really designed for loading/unloading data from non-C* sources as a replacement for the COPY command. Cheers!
Re: sstableloader: How much does it actually need?
Another option is to use the Spark migrator, it reads a source CQL cluster and writes to another. It has a validation stage that compares a full scan and reports the diff: https://github.com/scylladb/scylla-migrator There are many more ways to clone a cluster. My main recommendation is to 'optimize' for correctness and simplicity first and only last optimize for performance/time. Eventually machine time for such rare operation is cheap, engineering time is expensive and data inconsistency is priceless.. On Wed, Feb 5, 2020 at 5:24 PM Sergio wrote: > > Another option is the DSE-bulk loader but it will require to convert to > csv/json (good option if you don't like to play with sstableloader and deal > to get all the sstables from all the nodes) > https://docs.datastax.com/en/dsbulk/doc/index.html > > Cheers > > Sergio > > Il giorno mer 5 feb 2020 alle ore 16:56 Erick Ramirez > ha scritto: >> >> Unfortunately, there isn't a guarantee that 2 nodes alone will have the full >> copy of data. I'd rather not say "it depends". >> >> TIP: If the nodes in the target cluster have identical tokens allocated, you >> can just do a straight copy of the sstables node-for-node then do nodetool >> refresh. If the target cluster is already built and you can't assign the >> same tokens then sstableloader is your only option. Cheers! >> >> P.S. No need to apologise for asking questions. That's what we're all here >> for. Just keep them coming. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: sstableloader: How much does it actually need?
Another option is the DSE-bulk loader but it will require to convert to csv/json (good option if you don't like to play with sstableloader and deal to get all the sstables from all the nodes) https://docs.datastax.com/en/dsbulk/doc/index.html Cheers Sergio Il giorno mer 5 feb 2020 alle ore 16:56 Erick Ramirez ha scritto: > Unfortunately, there isn't a guarantee that 2 nodes alone will have the > full copy of data. I'd rather not say "it depends". > > TIP: If the nodes in the target cluster have identical tokens allocated, > you can just do a straight copy of the sstables node-for-node then do nodetool > refresh. If the target cluster is already built and you can't assign the > same tokens then sstableloader is your only option. Cheers! > > P.S. No need to apologise for asking questions. That's what we're all here > for. Just keep them coming. > >>
Re: sstableloader: How much does it actually need?
Unfortunately, there isn't a guarantee that 2 nodes alone will have the full copy of data. I'd rather not say "it depends". TIP: If the nodes in the target cluster have identical tokens allocated, you can just do a straight copy of the sstables node-for-node then do nodetool refresh. If the target cluster is already built and you can't assign the same tokens then sstableloader is your only option. Cheers! P.S. No need to apologise for asking questions. That's what we're all here for. Just keep them coming. >
sstableloader: How much does it actually need?
Scenario: Cassandra 3.11.x, 4 nodes, RF=3; moving to identically-sized cluster via snapshots and sstableloader. As far as I can tell, in the topology given above, any 2 nodes contain all of the data. In terms of migrating this cluster, would there be any downsides or risks with snapshotting and loading (sstableloader) only 2 of the nodes rather than all 4? Apologies for the spate of hypotheticals lately, this project is making life interesting. Thanks, Voytek Jarnot
Re: [EXTERNAL] Re: sstableloader & num_tokens change
Odd. Have you seen this behavior? I ran a test last week, loaded snapshots from 4 nodes to 4 nodes (RF 3 on both ends) and did not notice a spike. That's not to say that it didn't happen, but I think I'd have noticed as I was loading approx 250GB x 4 (although sequentially rather than 4x sstableloader in parallel). Also, thanks to everyone for confirming no issue with num_tokens and sstableloader; appreciate it. On Mon, Jan 27, 2020 at 9:02 AM Durity, Sean R wrote: > I would suggest to be aware of potential data size expansion. If you load > (for example) three copies of the data into a new cluster (because the RF > of the origin cluster is 3), it will also get written to the RF of the new > cluster (3 more times). So, you could see data expansion of 9x the original > data size (or, origin RF * target RF), until compaction can run. > > > > > > Sean Durity – Staff Systems Engineer, Cassandra > > > > *From:* Erick Ramirez > *Sent:* Friday, January 24, 2020 11:03 PM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: sstableloader & num_tokens change > > > > > > If I may just loop this back to the question at hand: > > I'm curious if there are any gotchas with using sstableloader to restore > snapshots taken from 256-token nodes into a cluster with 32-token (or your > preferred number of tokens) nodes (otherwise same # of nodes and same RF). > > > > No, there isn't. It will work as designed so you're good to go. Cheers! > > > > > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >
RE: [EXTERNAL] Re: sstableloader & num_tokens change
I would suggest to be aware of potential data size expansion. If you load (for example) three copies of the data into a new cluster (because the RF of the origin cluster is 3), it will also get written to the RF of the new cluster (3 more times). So, you could see data expansion of 9x the original data size (or, origin RF * target RF), until compaction can run. Sean Durity – Staff Systems Engineer, Cassandra From: Erick Ramirez Sent: Friday, January 24, 2020 11:03 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: sstableloader & num_tokens change If I may just loop this back to the question at hand: I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token (or your preferred number of tokens) nodes (otherwise same # of nodes and same RF). No, there isn't. It will work as designed so you're good to go. Cheers! The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: sstableloader & num_tokens change
Hello Concerning the original question, I agreed with @eric_ramirez, sstableloader is transparent for token allocation number. just for info @voytek, check this post out https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html You lay be interested to now if you have your cluster well balanced with 32 tokens. 32 tokens seems to be the future default value, but changing the default vnodes token numbers seems not to be so straight forward cheers Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Sat, Jan 25, 2020 at 5:05 AM Erick Ramirez wrote: > On the subject of DSBulk, sstableloader is the tool of choice for this > scenario. > > +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader > for CSV/JSON formats. Cheers! >
Re: sstableloader & num_tokens change
On the subject of DSBulk, sstableloader is the tool of choice for this scenario. +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader for CSV/JSON formats. Cheers!
Re: sstableloader & num_tokens change
> If I may just loop this back to the question at hand: > > I'm curious if there are any gotchas with using sstableloader to restore > snapshots taken from 256-token nodes into a cluster with 32-token (or your > preferred number of tokens) nodes (otherwise same # of nodes and same RF). > No, there isn't. It will work as designed so you're good to go. Cheers! >
Re: sstableloader & num_tokens change
If I may just loop this back to the question at hand: I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token (or your preferred number of tokens) nodes (otherwise same # of nodes and same RF). On Fri, Jan 24, 2020 at 11:15 AM Sergio wrote: > https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html > > Just skimming through the docs > > I see examples by loading from CSV / JSON > > Maybe there is some other command or doc page that I am missing > > > > > On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth wrote: > >> Dsbulk works same as sstableloder. >> >> >> Regards, >> >> Nitan >> >> Cell: 510 449 9629 >> >> On Jan 24, 2020, at 10:40 AM, Sergio wrote: >> >> >> I was wondering if that improvement for token allocation would work even >> with just one rack. It should but I am not sure. >> >> Does Dsbulk support migration cluster to cluster without CSV or JSON >> export? >> >> Thanks and Regards >> >> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: >> >>> Instead of sstableloader consider dsbulk by datastax. >>> >>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < >>> rpinchb...@tripadvisor.com> wrote: >>> >>>> Jon Haddad has previously made the case for num_tokens=4. His >>>> Accelerate 2019 talk is available at: >>>> >>>> >>>> >>>> https://www.youtube.com/watch?v=swL7bCnolkU >>>> >>>> >>>> >>>> You might want to check that out. Also I think the amount of effort >>>> you put into evening out the token distribution increases as vnode count >>>> shrinks. The caveats are explored at: >>>> >>>> >>>> >>>> >>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >>>> >>>> >>>> >>>> >>>> >>>> *From: *Voytek Jarnot >>>> *Reply-To: *"user@cassandra.apache.org" >>>> *Date: *Friday, January 24, 2020 at 10:39 AM >>>> *To: *"user@cassandra.apache.org" >>>> *Subject: *sstableloader & num_tokens change >>>> >>>> >>>> >>>> *Message from External Sender* >>>> >>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different >>>> 4 node RF=3 cluster. >>>> >>>> >>>> >>>> I've read that 256 is not an optimal default num_tokens value, and that >>>> 32 is likely a better option. >>>> >>>> >>>> >>>> We have the "opportunity" to switch, as we're migrating environments >>>> and will likely be using sstableloader to do so. I'm curious if there are >>>> any gotchas with using sstableloader to restore snapshots taken from >>>> 256-token nodes into a cluster with 32-token nodes (otherwise same # of >>>> nodes and same RF). >>>> >>>> >>>> >>>> Thanks in advance. >>>> >>>
Re: sstableloader & num_tokens change
https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html Just skimming through the docs I see examples by loading from CSV / JSON Maybe there is some other command or doc page that I am missing On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth wrote: > Dsbulk works same as sstableloder. > > > Regards, > > Nitan > > Cell: 510 449 9629 > > On Jan 24, 2020, at 10:40 AM, Sergio wrote: > > > I was wondering if that improvement for token allocation would work even > with just one rack. It should but I am not sure. > > Does Dsbulk support migration cluster to cluster without CSV or JSON > export? > > Thanks and Regards > > On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: > >> Instead of sstableloader consider dsbulk by datastax. >> >> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < >> rpinchb...@tripadvisor.com> wrote: >> >>> Jon Haddad has previously made the case for num_tokens=4. His >>> Accelerate 2019 talk is available at: >>> >>> >>> >>> https://www.youtube.com/watch?v=swL7bCnolkU >>> >>> >>> >>> You might want to check that out. Also I think the amount of effort you >>> put into evening out the token distribution increases as vnode count >>> shrinks. The caveats are explored at: >>> >>> >>> >>> >>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >>> >>> >>> >>> >>> >>> *From: *Voytek Jarnot >>> *Reply-To: *"user@cassandra.apache.org" >>> *Date: *Friday, January 24, 2020 at 10:39 AM >>> *To: *"user@cassandra.apache.org" >>> *Subject: *sstableloader & num_tokens change >>> >>> >>> >>> *Message from External Sender* >>> >>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different >>> 4 node RF=3 cluster. >>> >>> >>> >>> I've read that 256 is not an optimal default num_tokens value, and that >>> 32 is likely a better option. >>> >>> >>> >>> We have the "opportunity" to switch, as we're migrating environments and >>> will likely be using sstableloader to do so. I'm curious if there are any >>> gotchas with using sstableloader to restore snapshots taken from 256-token >>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and >>> same RF). >>> >>> >>> >>> Thanks in advance. >>> >>
Re: sstableloader & num_tokens change
Dsbulk works same as sstableloder. Regards, Nitan Cell: 510 449 9629 > On Jan 24, 2020, at 10:40 AM, Sergio wrote: > > > I was wondering if that improvement for token allocation would work even with > just one rack. It should but I am not sure. > > Does Dsbulk support migration cluster to cluster without CSV or JSON export? > > Thanks and Regards > >> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: >> Instead of sstableloader consider dsbulk by datastax. >> >>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback >>> wrote: >>> Jon Haddad has previously made the case for num_tokens=4. His Accelerate >>> 2019 talk is available at: >>> >>> >>> >>> https://www.youtube.com/watch?v=swL7bCnolkU >>> >>> >>> >>> You might want to check that out. Also I think the amount of effort you >>> put into evening out the token distribution increases as vnode count >>> shrinks. The caveats are explored at: >>> >>> >>> >>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >>> >>> >>> >>> >>> >>> From: Voytek Jarnot >>> Reply-To: "user@cassandra.apache.org" >>> Date: Friday, January 24, 2020 at 10:39 AM >>> To: "user@cassandra.apache.org" >>> Subject: sstableloader & num_tokens change >>> >>> >>> >>> Message from External Sender >>> >>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 >>> node RF=3 cluster. >>> >>> >>> >>> I've read that 256 is not an optimal default num_tokens value, and that 32 >>> is likely a better option. >>> >>> >>> >>> We have the "opportunity" to switch, as we're migrating environments and >>> will likely be using sstableloader to do so. I'm curious if there are any >>> gotchas with using sstableloader to restore snapshots taken from 256-token >>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and >>> same RF). >>> >>> >>> >>> Thanks in advance.
Re: sstableloader & num_tokens change
Why? Seems to me that the old Cassandra -> CSV/JSON and CSV/JSON -> new Cassandra are unnecessary steps in my case. On Fri, Jan 24, 2020 at 10:34 AM Nitan Kainth wrote: > Instead of sstableloader consider dsbulk by datastax. > > On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < > rpinchb...@tripadvisor.com> wrote: > >> Jon Haddad has previously made the case for num_tokens=4. His Accelerate >> 2019 talk is available at: >> >> >> >> https://www.youtube.com/watch?v=swL7bCnolkU >> >> >> >> You might want to check that out. Also I think the amount of effort you >> put into evening out the token distribution increases as vnode count >> shrinks. The caveats are explored at: >> >> >> >> >> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >> >> >> >> >> >> *From: *Voytek Jarnot >> *Reply-To: *"user@cassandra.apache.org" >> *Date: *Friday, January 24, 2020 at 10:39 AM >> *To: *"user@cassandra.apache.org" >> *Subject: *sstableloader & num_tokens change >> >> >> >> *Message from External Sender* >> >> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 >> node RF=3 cluster. >> >> >> >> I've read that 256 is not an optimal default num_tokens value, and that >> 32 is likely a better option. >> >> >> >> We have the "opportunity" to switch, as we're migrating environments and >> will likely be using sstableloader to do so. I'm curious if there are any >> gotchas with using sstableloader to restore snapshots taken from 256-token >> nodes into a cluster with 32-token nodes (otherwise same # of nodes and >> same RF). >> >> >> >> Thanks in advance. >> >
Re: sstableloader & num_tokens change
I was wondering if that improvement for token allocation would work even with just one rack. It should but I am not sure. Does Dsbulk support migration cluster to cluster without CSV or JSON export? Thanks and Regards On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth wrote: > Instead of sstableloader consider dsbulk by datastax. > > On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback < > rpinchb...@tripadvisor.com> wrote: > >> Jon Haddad has previously made the case for num_tokens=4. His Accelerate >> 2019 talk is available at: >> >> >> >> https://www.youtube.com/watch?v=swL7bCnolkU >> >> >> >> You might want to check that out. Also I think the amount of effort you >> put into evening out the token distribution increases as vnode count >> shrinks. The caveats are explored at: >> >> >> >> >> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html >> >> >> >> >> >> *From: *Voytek Jarnot >> *Reply-To: *"user@cassandra.apache.org" >> *Date: *Friday, January 24, 2020 at 10:39 AM >> *To: *"user@cassandra.apache.org" >> *Subject: *sstableloader & num_tokens change >> >> >> >> *Message from External Sender* >> >> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 >> node RF=3 cluster. >> >> >> >> I've read that 256 is not an optimal default num_tokens value, and that >> 32 is likely a better option. >> >> >> >> We have the "opportunity" to switch, as we're migrating environments and >> will likely be using sstableloader to do so. I'm curious if there are any >> gotchas with using sstableloader to restore snapshots taken from 256-token >> nodes into a cluster with 32-token nodes (otherwise same # of nodes and >> same RF). >> >> >> >> Thanks in advance. >> >
Re: sstableloader & num_tokens change
Instead of sstableloader consider dsbulk by datastax. On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback wrote: > Jon Haddad has previously made the case for num_tokens=4. His Accelerate > 2019 talk is available at: > > > > https://www.youtube.com/watch?v=swL7bCnolkU > > > > You might want to check that out. Also I think the amount of effort you > put into evening out the token distribution increases as vnode count > shrinks. The caveats are explored at: > > > > > https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html > > > > > > *From: *Voytek Jarnot > *Reply-To: *"user@cassandra.apache.org" > *Date: *Friday, January 24, 2020 at 10:39 AM > *To: *"user@cassandra.apache.org" > *Subject: *sstableloader & num_tokens change > > > > *Message from External Sender* > > Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 > node RF=3 cluster. > > > > I've read that 256 is not an optimal default num_tokens value, and that 32 > is likely a better option. > > > > We have the "opportunity" to switch, as we're migrating environments and > will likely be using sstableloader to do so. I'm curious if there are any > gotchas with using sstableloader to restore snapshots taken from 256-token > nodes into a cluster with 32-token nodes (otherwise same # of nodes and > same RF). > > > > Thanks in advance. >
Re: sstableloader & num_tokens change
Jon Haddad has previously made the case for num_tokens=4. His Accelerate 2019 talk is available at: https://www.youtube.com/watch?v=swL7bCnolkU You might want to check that out. Also I think the amount of effort you put into evening out the token distribution increases as vnode count shrinks. The caveats are explored at: https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html From: Voytek Jarnot Reply-To: "user@cassandra.apache.org" Date: Friday, January 24, 2020 at 10:39 AM To: "user@cassandra.apache.org" Subject: sstableloader & num_tokens change Message from External Sender Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node RF=3 cluster. I've read that 256 is not an optimal default num_tokens value, and that 32 is likely a better option. We have the "opportunity" to switch, as we're migrating environments and will likely be using sstableloader to do so. I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token nodes (otherwise same # of nodes and same RF). Thanks in advance.
sstableloader & num_tokens change
Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node RF=3 cluster. I've read that 256 is not an optimal default num_tokens value, and that 32 is likely a better option. We have the "opportunity" to switch, as we're migrating environments and will likely be using sstableloader to do so. I'm curious if there are any gotchas with using sstableloader to restore snapshots taken from 256-token nodes into a cluster with 32-token nodes (otherwise same # of nodes and same RF). Thanks in advance.
Re: [EXTERNAL] Re: Sstableloader
It appears you have two goals you are trying to accomplish at the same time. My recommendation is to break it into two different steps. You need to decide if you are going to upgrade DSE or OSS. * Upgrade DSE then migrate to OSS * Upgrade DSE to version that matches OSS 3.11.3 binary * Perform datacenter switch * Migrate to OSS then upgrade * Migrate to OSS using version that matches DSE Cassandra binary (DSE 5.0.7 = 3.0.11) * Upgrade OSS to 3.11.3 binary From: Rahul Reddy Date: Thursday, May 30, 2019 at 6:37 AM To: Cassandra User List Cc: Anthony Goetz Subject: [EXTERNAL] Re: Sstableloader Thank you Anthony and Jonathan. To add new ring it doesn't have to be same version of Cassandra right. For ex dse 5.12 which is 3.11.0 has stables with mc name and apache 3.11.3 also uses sstables name with mc . We should be still able to add it to the ring correct On Wed, May 29, 2019, 9:55 PM Goetz, Anthony mailto:anthony_goe...@comcast.com>> wrote: My team migrated from DSE to OSS a few years ago by doing datacenter switch. You will need to update replication strategy for all keyspaces that are using Everywhere to NetworkTopologyStrategy before adding any OSS nodes. As Jonathan mentioned, DSE nodes will revert this change on restart. To account for this, we modified our init script to call a cql script that would make sure the keyspaces were set back to NetworkTopologyStrategy. High Level Plan: * Find DSE Cassandra binary version * Review config to make sure you are not using any DSE specific settings * Update replication strategy on keyspaces using Everywhere to NetworkTopologyStrategy * Add OSS DC using same binary version as DSE * Migrate clients to new OSS DC * Decommission DSE DC Note: OpsCenter will stop working once you add OSS nodes. From: Jonathan Koppenhofer mailto:j...@koppedomain.com>> Reply-To: Cassandra User List mailto:user@cassandra.apache.org>> Date: Wednesday, May 29, 2019 at 6:45 PM To: Cassandra User List mailto:user@cassandra.apache.org>> Subject: [EXTERNAL] Re: Sstableloader Has anyone tried to do a DC switch as a means to migrate from Datastax to OSS? This would be the safest route as the ability to revert back to Datastax is easy. However, I'm curious how the dse_system keyspace would be replicated to OSS using their custom Everywhere strategy. You may have to change the to Network topology strategy before firing up OSS nodes. Also, keep in mind if you restart any DSE nodes, it will revert that keyspace back to EverywhereStrategy. I also posted a means to migrate in place on this mailing list a few months back (thanks for help from others on the mailing list), but it is a little more involved and risky. Let me know if you can't find it, and I'll dig it up. Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS 3.0 then up to 3.11. On Wed, May 29, 2019, 5:56 PM Nitan Kainth mailto:nitankai...@gmail.com>> wrote: If cassandra version is same, it should work Regards, Nitan Cell: 510 449 9629 On May 28, 2019, at 4:21 PM, Rahul Reddy mailto:rahulreddy1...@gmail.com>> wrote: Hello, Does sstableloader works between datastax and Apache cassandra. I'm trying to migrate dse 5.0.7 to Apache 3.11.1 ?
Re: Sstableloader
Thank you Anthony and Jonathan. To add new ring it doesn't have to be same version of Cassandra right. For ex dse 5.12 which is 3.11.0 has stables with mc name and apache 3.11.3 also uses sstables name with mc . We should be still able to add it to the ring correct On Wed, May 29, 2019, 9:55 PM Goetz, Anthony wrote: > My team migrated from DSE to OSS a few years ago by doing datacenter > switch. You will need to update replication strategy for all keyspaces > that are using Everywhere to NetworkTopologyStrategy before adding any OSS > nodes. As Jonathan mentioned, DSE nodes will revert this change on > restart. To account for this, we modified our init script to call a cql > script that would make sure the keyspaces were set back to > NetworkTopologyStrategy. > > > > High Level Plan: > >- Find DSE Cassandra binary version >- Review config to make sure you are not using any DSE specific >settings >- Update replication strategy on keyspaces using Everywhere to >NetworkTopologyStrategy >- Add OSS DC using same binary version as DSE >- Migrate clients to new OSS DC >- Decommission DSE DC > > > > Note: OpsCenter will stop working once you add OSS nodes. > > > > *From: *Jonathan Koppenhofer > *Reply-To: *Cassandra User List > *Date: *Wednesday, May 29, 2019 at 6:45 PM > *To: *Cassandra User List > *Subject: *[EXTERNAL] Re: Sstableloader > > > > Has anyone tried to do a DC switch as a means to migrate from Datastax to > OSS? This would be the safest route as the ability to revert back to > Datastax is easy. However, I'm curious how the dse_system keyspace would be > replicated to OSS using their custom Everywhere strategy. You may have to > change the to Network topology strategy before firing up OSS nodes. Also, > keep in mind if you restart any DSE nodes, it will revert that keyspace > back to EverywhereStrategy. > > > > I also posted a means to migrate in place on this mailing list a few > months back (thanks for help from others on the mailing list), but it is a > little more involved and risky. Let me know if you can't find it, and I'll > dig it up. > > > > Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS > 3.0 then up to 3.11. > > On Wed, May 29, 2019, 5:56 PM Nitan Kainth wrote: > > If cassandra version is same, it should work > > > > Regards, > > Nitan > > Cell: 510 449 9629 > > > On May 28, 2019, at 4:21 PM, Rahul Reddy wrote: > > Hello, > > > > Does sstableloader works between datastax and Apache cassandra. I'm trying > to migrate dse 5.0.7 to Apache 3.11.1 ? > >
Re: Sstableloader
Over the past year we've migrated several clusters from DSE to Apache Cassandra. We've mostly done I place conversions node by node with no downtime. DSE 4.8.X to Apache Cassandra 2.1.x On Wed, May 29, 2019 at 8:55 PM Goetz, Anthony wrote: > My team migrated from DSE to OSS a few years ago by doing datacenter > switch. You will need to update replication strategy for all keyspaces > that are using Everywhere to NetworkTopologyStrategy before adding any OSS > nodes. As Jonathan mentioned, DSE nodes will revert this change on > restart. To account for this, we modified our init script to call a cql > script that would make sure the keyspaces were set back to > NetworkTopologyStrategy. > > > > High Level Plan: > >- Find DSE Cassandra binary version >- Review config to make sure you are not using any DSE specific >settings >- Update replication strategy on keyspaces using Everywhere to >NetworkTopologyStrategy >- Add OSS DC using same binary version as DSE >- Migrate clients to new OSS DC >- Decommission DSE DC > > > > Note: OpsCenter will stop working once you add OSS nodes. > > > > *From: *Jonathan Koppenhofer > *Reply-To: *Cassandra User List > *Date: *Wednesday, May 29, 2019 at 6:45 PM > *To: *Cassandra User List > *Subject: *[EXTERNAL] Re: Sstableloader > > > > Has anyone tried to do a DC switch as a means to migrate from Datastax to > OSS? This would be the safest route as the ability to revert back to > Datastax is easy. However, I'm curious how the dse_system keyspace would be > replicated to OSS using their custom Everywhere strategy. You may have to > change the to Network topology strategy before firing up OSS nodes. Also, > keep in mind if you restart any DSE nodes, it will revert that keyspace > back to EverywhereStrategy. > > > > I also posted a means to migrate in place on this mailing list a few > months back (thanks for help from others on the mailing list), but it is a > little more involved and risky. Let me know if you can't find it, and I'll > dig it up. > > > > Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS > 3.0 then up to 3.11. > > On Wed, May 29, 2019, 5:56 PM Nitan Kainth wrote: > > If cassandra version is same, it should work > > > > Regards, > > Nitan > > Cell: 510 449 9629 > > > On May 28, 2019, at 4:21 PM, Rahul Reddy wrote: > > Hello, > > > > Does sstableloader works between datastax and Apache cassandra. I'm trying > to migrate dse 5.0.7 to Apache 3.11.1 ? > >
Re: Sstableloader
My team migrated from DSE to OSS a few years ago by doing datacenter switch. You will need to update replication strategy for all keyspaces that are using Everywhere to NetworkTopologyStrategy before adding any OSS nodes. As Jonathan mentioned, DSE nodes will revert this change on restart. To account for this, we modified our init script to call a cql script that would make sure the keyspaces were set back to NetworkTopologyStrategy. High Level Plan: * Find DSE Cassandra binary version * Review config to make sure you are not using any DSE specific settings * Update replication strategy on keyspaces using Everywhere to NetworkTopologyStrategy * Add OSS DC using same binary version as DSE * Migrate clients to new OSS DC * Decommission DSE DC Note: OpsCenter will stop working once you add OSS nodes. From: Jonathan Koppenhofer Reply-To: Cassandra User List Date: Wednesday, May 29, 2019 at 6:45 PM To: Cassandra User List Subject: [EXTERNAL] Re: Sstableloader Has anyone tried to do a DC switch as a means to migrate from Datastax to OSS? This would be the safest route as the ability to revert back to Datastax is easy. However, I'm curious how the dse_system keyspace would be replicated to OSS using their custom Everywhere strategy. You may have to change the to Network topology strategy before firing up OSS nodes. Also, keep in mind if you restart any DSE nodes, it will revert that keyspace back to EverywhereStrategy. I also posted a means to migrate in place on this mailing list a few months back (thanks for help from others on the mailing list), but it is a little more involved and risky. Let me know if you can't find it, and I'll dig it up. Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS 3.0 then up to 3.11. On Wed, May 29, 2019, 5:56 PM Nitan Kainth mailto:nitankai...@gmail.com>> wrote: If cassandra version is same, it should work Regards, Nitan Cell: 510 449 9629 On May 28, 2019, at 4:21 PM, Rahul Reddy mailto:rahulreddy1...@gmail.com>> wrote: Hello, Does sstableloader works between datastax and Apache cassandra. I'm trying to migrate dse 5.0.7 to Apache 3.11.1 ?
Re: Sstableloader
Has anyone tried to do a DC switch as a means to migrate from Datastax to OSS? This would be the safest route as the ability to revert back to Datastax is easy. However, I'm curious how the dse_system keyspace would be replicated to OSS using their custom Everywhere strategy. You may have to change the to Network topology strategy before firing up OSS nodes. Also, keep in mind if you restart any DSE nodes, it will revert that keyspace back to EverywhereStrategy. I also posted a means to migrate in place on this mailing list a few months back (thanks for help from others on the mailing list), but it is a little more involved and risky. Let me know if you can't find it, and I'll dig it up. Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS 3.0 then up to 3.11. On Wed, May 29, 2019, 5:56 PM Nitan Kainth wrote: > If cassandra version is same, it should work > > > Regards, > > Nitan > > Cell: 510 449 9629 > > On May 28, 2019, at 4:21 PM, Rahul Reddy wrote: > > Hello, > > Does sstableloader works between datastax and Apache cassandra. I'm trying > to migrate dse 5.0.7 to Apache 3.11.1 ? > >
Re: Sstableloader
If cassandra version is same, it should work Regards, Nitan Cell: 510 449 9629 > On May 28, 2019, at 4:21 PM, Rahul Reddy wrote: > > Hello, > > Does sstableloader works between datastax and Apache cassandra. I'm trying to > migrate dse 5.0.7 to Apache 3.11.1 ?
Re: Sstableloader
Hello, I can't answer this question about the sstableloader (even though I think it should be ok). My understanding, even though I'm not really up to date with latest Datastax work, is that DSE uses a modified but compatible version of Cassandra, for everything that is not 'DSE feature' specifically. Especially I expect SSTable format to be the same. SSTable loader has always been slow and inefficient for me though I did not use it much. I think the way out DSE should be documented somewhere in Datastax docs, if not I think you can ask Datastax directly (or maybe someone here can help you). My guess is that the safest way out, without any downtime is probably to perform a datacenter 'switch': - Identify the Apache Cassandra version used under the hood by DSE (5.0.7). Let's say it's 3.11.1 (I don't know) - Add a new Apache Cassandra datacenter to your DSE cluster using this version (I would rather use 3.11.latest in this case though... 3.11.1 had memory leaks and other wild issues). - Move client to this new DC - Shutdown the old DC. I wrote a runbook to perform such an operation not that long ago, you can find it here: https://thelastpickle.com/blog/2019/02/26/data-center-switch.html I don't know for sure that this is the best way to go out of DSE, but that would be my guess and the first thing I would investigate (before SSTableLoader, clearly). Hope that helps, even though it does not directly answers the question (that I'm unable to answer) about SSTable & SSTableLoader compatibility with DSE clusters. C*heers Le mar. 28 mai 2019 à 22:22, Rahul Reddy a écrit : > Hello, > > Does sstableloader works between datastax and Apache cassandra. I'm trying > to migrate dse 5.0.7 to Apache 3.11.1 ? >
Sstableloader
Hello, Does sstableloader works between datastax and Apache cassandra. I'm trying to migrate dse 5.0.7 to Apache 3.11.1 ?
re: Trouble restoring with sstableloader
This is a response to a message from 2017 that I found unanswered on the user list, we were getting the same error. Also in this stackoverflow https://stackoverflow.com/questions/53160611/frame-size-352518912-larger-than-max-length-15728640-exception-while-runnin/55751104#55751104 I have noted what we had to do to get things working. In this case it appears the -tf and/or various keystore/truststore params weren't supplied. In our case we weren't doing the -tf parameter. ... then we ran into the PKIX error. Original message: --- Hi all, I've been running into the following issue while trying to restore a C* database via sstableloader: Could not retrieve endpoint ranges: org.apache.thrift.transport.TTransportException: Frame size (352518912) larger than max length (15728640)! java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:283) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95) Caused by: org.apache.thrift.transport.TTransportException: Frame size (352518912) larger than max length (15728640)! at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1327) at org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1315) at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:256) ... 2 more This seems odd since the frame size thrift is asking for is over 336 MB. This is happening using Cassandra 2.0.12 | Thrift protocol 19.39.0 Any advice? Thanks! --Jim
streaming errors with sstableloader
Hello community, I'm receiving some strange streaming errors while trying to restore certain sstables snapshots with sstableloader to a new cluster. While the cluster is up and running and nodes are communicating with each other, I can see streams failing to the nodes with no obvious reason and the only exception thrown is: ERROR 14:00:08,403 [Stream #3d572210-f95f-11e8-bf2d-01149b1d085c] Streaming error occurred on session with peer 10.35.81.88 java.lang.NullPointerException: null at org.apache.cassandra.db.SerializationHeader$Component.access$400(SerializationHeader.java:271) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.SerializationHeader$Serializer.serialize(SerializationHeader.java:445) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:216) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:94) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:52) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:50) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:408) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380) ~[apache-cassandra-3.11.3.jar:3.11.3] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191] progress: [/10.35.81.88]0:0/3 0 % [/10.35.81.79]0:1/3 0 % [ cassandra01-test.sofia.elex.be/10.35.81.76]0:1/3 0 % total: 0% 2.652KiB/s (avg: 2.652KiB/s) progress: [/10.35.81.88]0:0/3 0 % [/10.35.81.79]0:1/3 0 % [ cassandra01-test.sofia.elex.be/10.35.81.76]0:1/3 0 % total: 0% 0.000KiB/s (avg: 2.651KiB/s) progress: [/10.35.81.88]0:0/3 0 % [/10.35.81.79]0:1/3 0 % [ cassandra01-test.sofia.elex.be/10.35.81.76]0:1/3 0 % total: 0% 0.000KiB/s (avg: 2.650KiB/s) ERROR 14:00:08,406 [Stream #3d572210-f95f-11e8-bf2d-01149b1d085c] Streaming error occurred on session with peer 10.35.81.79 java.lang.NullPointerException: null at org.apache.cassandra.db.SerializationHeader$Component.access$400(SerializationHeader.java:271) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.SerializationHeader$Serializer.serialize(SerializationHeader.java:445) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:216) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:94) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:52) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:50) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:408) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380) ~[apache-cassandra-3.11.3.jar:3.11.3] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191] progress: [/10.35.81.88]0:0/3 0 % [/10.35.81.79]0:1/3 0 % [ cassandra01-test.sofia.elex.be/10.35.81.76]0:1/3 0 % total: 0% 0.000KiB/s (avg: 2.650KiB/s) ERROR 14:00:08,407 [Stream #3d572210-f95f-11e8-bf2d-01149b1d085c] Remote peer 10.35.81.88 failed stream session. ERROR 14:00:08,408 [Stream #3d572210-f95f-11e8-bf2d-01149b1d085c] Streaming error occurred on session with peer 10.35.81.76 java.lang.NullPointerException: null at org.apache.cassandra.db.SerializationHeader$Component.access$400(SerializationHeader.java:271) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.db.SerializationHeader$Serializer.serialize(SerializationHeader.java:445) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:216) ~[apache-cassandra-3.11.3.jar:3.11.3] at org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:94) ~[apache-cassandra-3.11.3.jar:3.11.3
Re: Problem with restoring a snapshot using sstableloader
On Mon, Dec 3, 2018 at 4:24 PM Oliver Herrmann wrote: > > You are right. The number of nodes in our cluster is equal to the > replication factor. For that reason I think it should be sufficient to call > sstableloader only from one node. > The next question is then: do you care about consistency of data restored from one snapshot? Is the snapshot taken after repair? Do you still write to those tables? In other words, your data will be consistent after restoring from one node's snapshot only if you were writing with consistency level ALL (or equal to your replication factor and, transitively, to the number of nodes). -- Oleksandr "Alex" Shulgin | Senior Software Engineer | Team Flux | Data Services | Zalando SE | Tel: +49 176 127-59-707
Re: Problem with restoring a snapshot using sstableloader
Am So., 2. Dez. 2018 um 06:24 Uhr schrieb Oleksandr Shulgin < oleksandr.shul...@zalando.de>: > On Fri, 30 Nov 2018, 17:54 Oliver Herrmann >> When using nodetool refresh I must have write access to the data folder >> and I have to do it on every node. In our production environment the user >> that would do the restore does not have write access to the data folder. >> > > OK, not entirely sure that's a reasonable setup, but do you imply that > with sstableloader you don't need to process every snapshot taken -- that > is, also visiting every node? That would only be true if your replication > factor equals to the number of nodes, IMO. > You are right. The number of nodes in our cluster is equal to the replication factor. For that reason I think it should be sufficient to call sstableloader only from one node.
Re: Problem with restoring a snapshot using sstableloader
It's a bug in the sstableloader introduced many years ago - before that, it worked as described in documentation... Oliver Herrmann at "Fri, 30 Nov 2018 17:05:43 +0100" wrote: OH> Hi, OH> I'm having some problems to restore a snapshot using sstableloader. I'm using cassandra 3.11.1 and followed the instructions for OH> a creating and restoring from this page: OH> https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/toolsBulkloader.html OH> 1. Called nodetool cleanup on each node OH> $ nodetool cleanup cass_testapp OH> 2. Called nodetool snapshot on each node OH> $ nodetool snapshot -t snap1 -kt cass_testapp.table3 OH> 3. Checked the data and snapshot folders: OH> $ ll /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb OH> drwxr-xr-x 2 cassandra cassandra 6 Nov 29 03:54 backups OH> -rw-r--r-- 2 cassandra cassandra 43 Nov 30 10:21 mc-11-big-CompressionInfo.db OH> -rw-r--r-- 2 cassandra cassandra 241 Nov 30 10:21 mc-11-big-Data.db OH> -rw-r--r-- 2 cassandra cassandra 9 Nov 30 10:21 mc-11-big-Digest.crc32 OH> -rw-r--r-- 2 cassandra cassandra 16 Nov 30 10:21 mc-11-big-Filter.db OH> -rw-r--r-- 2 cassandra cassandra 21 Nov 30 10:21 mc-11-big-Index.db OH> -rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db OH> -rw-r--r-- 2 cassandra cassandra 95 Nov 30 10:21 mc-11-big-Summary.db OH> -rw-r--r-- 2 cassandra cassandra 92 Nov 30 10:21 mc-11-big-TOC.txt OH> drwxr-xr-x 3 cassandra cassandra 18 Nov 30 10:30 snapshots OH> and OH> $ ll /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/ OH> total 44 OH> -rw-r--r-- 1 cassandra cassandra 32 Nov 30 10:30 manifest.json OH> -rw-r--r-- 2 cassandra cassandra 43 Nov 30 10:21 mc-11-big-CompressionInfo.db OH> -rw-r--r-- 2 cassandra cassandra 241 Nov 30 10:21 mc-11-big-Data.db OH> -rw-r--r-- 2 cassandra cassandra 9 Nov 30 10:21 mc-11-big-Digest.crc32 OH> -rw-r--r-- 2 cassandra cassandra 16 Nov 30 10:21 mc-11-big-Filter.db OH> -rw-r--r-- 2 cassandra cassandra 21 Nov 30 10:21 mc-11-big-Index.db OH> -rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db OH> -rw-r--r-- 2 cassandra cassandra 95 Nov 30 10:21 mc-11-big-Summary.db OH> -rw-r--r-- 2 cassandra cassandra 92 Nov 30 10:21 mc-11-big-TOC.txt OH> -rw-r--r-- 1 cassandra cassandra 1043 Nov 30 10:30 schema.cql OH> 4. Truncated the table OH> cqlsh:cass_testapp> TRUNCATE table3 ; OH> 5. Tried to restore table3 on one cassandra node OH> $ sstableloader -d localhost /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/ OH> Established connection to initial hosts OH> Opening sstables and calculating sections to stream OH> Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist OH> Summary statistics: OH> Connections per host : 1 OH> Total files transferred : 0 OH> Total bytes transferred : 0.000KiB OH> Total duration : 2652 ms OH> Average transfer rate : 0.000KiB/s OH> Peak transfer rate : 0.000KiB/s OH> I'm always getting the message "Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist". I also tried to rename OH> the snapshots folder into the keyspace name (cass_testapp) but then I get the message "Skipping file mc-11-big-Data.db: table OH> snap1.snap1. doesn't exist". OH> What I'm doing wrong? OH> Thanks OH> Oliver -- With best wishes,Alex Ott Solutions Architect EMEA, DataStax http://datastax.com/ - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Problem with restoring a snapshot using sstableloader
On Fri, 30 Nov 2018, 17:54 Oliver Herrmann When using nodetool refresh I must have write access to the data folder > and I have to do it on every node. In our production environment the user > that would do the restore does not have write access to the data folder. > OK, not entirely sure that's a reasonable setup, but do you imply that with sstableloader you don't need to process every snapshot taken -- that is, also visiting every node? That would only be true if your replication factor equals to the number of nodes, IMO. -- Alex
AW:Problem with restoring a snapshot using sstableloader
Thanks Dmitry, that solved my problem.Oliver Originalnachricht Betreff: Re: Problem with restoring a snapshot using sstableloaderVon: Dmitry Saprykin An: user@cassandra.apache.orgCc: You need to move you files into directory named 'cass_testapp/table3/'. sstable loader uses 2 last path components as keyspace and table names.On Fri, Nov 30, 2018 at 11:54 AM Oliver Herrmann <o.herrmann...@gmail.com> wrote:When using nodetool refresh I must have write access to the data folder and I have to do it on every node. In our production environment the user that would do the restore does not have write access to the data folder.Am Fr., 30. Nov. 2018 um 17:39 Uhr schrieb Oleksandr Shulgin <oleksandr.shul...@zalando.de>:On Fri, Nov 30, 2018 at 5:13 PM Oliver Herrmann <o.herrmann...@gmail.com> wrote:I'm always getting the message "Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist". I also tried to rename the snapshots folder into the keyspace name (cass_testapp) but then I get the message "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".Hi,I imagine moving the files from snapshot directory to the data directory and then running `nodetool refresh` is the supported way. Why use sstableloader for that?--Alex - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Problem with restoring a snapshot using sstableloader
You need to move you files into directory named 'cass_testapp/table3/'. sstable loader uses 2 last path components as keyspace and table names. On Fri, Nov 30, 2018 at 11:54 AM Oliver Herrmann wrote: > When using nodetool refresh I must have write access to the data folder > and I have to do it on every node. In our production environment the user > that would do the restore does not have write access to the data folder. > > Am Fr., 30. Nov. 2018 um 17:39 Uhr schrieb Oleksandr Shulgin < > oleksandr.shul...@zalando.de>: > >> On Fri, Nov 30, 2018 at 5:13 PM Oliver Herrmann >> wrote: >> >>> >>> I'm always getting the message "Skipping file mc-11-big-Data.db: table >>> snapshots.table3 doesn't exist". I also tried to rename the snapshots >>> folder into the keyspace name (cass_testapp) but then I get the message >>> "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist". >>> >> >> Hi, >> >> I imagine moving the files from snapshot directory to the data directory >> and then running `nodetool refresh` is the supported way. Why use >> sstableloader for that? >> >> -- >> Alex >> >>
Re: Problem with restoring a snapshot using sstableloader
When using nodetool refresh I must have write access to the data folder and I have to do it on every node. In our production environment the user that would do the restore does not have write access to the data folder. Am Fr., 30. Nov. 2018 um 17:39 Uhr schrieb Oleksandr Shulgin < oleksandr.shul...@zalando.de>: > On Fri, Nov 30, 2018 at 5:13 PM Oliver Herrmann > wrote: > >> >> I'm always getting the message "Skipping file mc-11-big-Data.db: table >> snapshots.table3 doesn't exist". I also tried to rename the snapshots >> folder into the keyspace name (cass_testapp) but then I get the message >> "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist". >> > > Hi, > > I imagine moving the files from snapshot directory to the data directory > and then running `nodetool refresh` is the supported way. Why use > sstableloader for that? > > -- > Alex > >
Re: Problem with restoring a snapshot using sstableloader
On Fri, Nov 30, 2018 at 5:13 PM Oliver Herrmann wrote: > > I'm always getting the message "Skipping file mc-11-big-Data.db: table > snapshots.table3 doesn't exist". I also tried to rename the snapshots > folder into the keyspace name (cass_testapp) but then I get the message > "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist". > Hi, I imagine moving the files from snapshot directory to the data directory and then running `nodetool refresh` is the supported way. Why use sstableloader for that? -- Alex
Problem with restoring a snapshot using sstableloader
Hi, I'm having some problems to restore a snapshot using sstableloader. I'm using cassandra 3.11.1 and followed the instructions for a creating and restoring from this page: https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/toolsBulkloader.html 1. Called nodetool cleanup on each node $ nodetool cleanup cass_testapp 2. Called nodetool snapshot on each node $ nodetool snapshot -t snap1 -kt cass_testapp.table3 3. Checked the data and snapshot folders: $ ll /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb drwxr-xr-x 2 cassandra cassandra6 Nov 29 03:54 backups -rw-r--r-- 2 cassandra cassandra 43 Nov 30 10:21 mc-11-big-CompressionInfo.db -rw-r--r-- 2 cassandra cassandra 241 Nov 30 10:21 mc-11-big-Data.db -rw-r--r-- 2 cassandra cassandra9 Nov 30 10:21 mc-11-big-Digest.crc32 -rw-r--r-- 2 cassandra cassandra 16 Nov 30 10:21 mc-11-big-Filter.db -rw-r--r-- 2 cassandra cassandra 21 Nov 30 10:21 mc-11-big-Index.db -rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db -rw-r--r-- 2 cassandra cassandra 95 Nov 30 10:21 mc-11-big-Summary.db -rw-r--r-- 2 cassandra cassandra 92 Nov 30 10:21 mc-11-big-TOC.txt drwxr-xr-x 3 cassandra cassandra 18 Nov 30 10:30 snapshots and $ ll /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/ total 44 -rw-r--r-- 1 cassandra cassandra 32 Nov 30 10:30 manifest.json -rw-r--r-- 2 cassandra cassandra 43 Nov 30 10:21 mc-11-big-CompressionInfo.db -rw-r--r-- 2 cassandra cassandra 241 Nov 30 10:21 mc-11-big-Data.db -rw-r--r-- 2 cassandra cassandra9 Nov 30 10:21 mc-11-big-Digest.crc32 -rw-r--r-- 2 cassandra cassandra 16 Nov 30 10:21 mc-11-big-Filter.db -rw-r--r-- 2 cassandra cassandra 21 Nov 30 10:21 mc-11-big-Index.db -rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db -rw-r--r-- 2 cassandra cassandra 95 Nov 30 10:21 mc-11-big-Summary.db -rw-r--r-- 2 cassandra cassandra 92 Nov 30 10:21 mc-11-big-TOC.txt -rw-r--r-- 1 cassandra cassandra 1043 Nov 30 10:30 schema.cql 4. Truncated the table cqlsh:cass_testapp> TRUNCATE table3 ; 5. Tried to restore table3 on one cassandra node $ sstableloader -d localhost /var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/ Established connection to initial hosts Opening sstables and calculating sections to stream Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist Summary statistics: Connections per host: 1 Total files transferred : 0 Total bytes transferred : 0.000KiB Total duration : 2652 ms Average transfer rate : 0.000KiB/s Peak transfer rate : 0.000KiB/s I'm always getting the message "Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist". I also tried to rename the snapshots folder into the keyspace name (cass_testapp) but then I get the message "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist". What I'm doing wrong? Thanks Oliver
Re: Exception when running sstableloader
Hello LAD, I do not know much about the SSTable Loader. I carefully stayed away from it so far :). But it seems it's using thrift to talk to Cassandra. Some of your rows might be too big and increasing 'thrift_framed_transport_size_in_mb' should have helped indeed. Did you / Would you try with increasing this as well: 'thrift_max_message_length_in_mb' and see what happens? Cheers, --- Alain Rodriguez - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le lun. 5 nov. 2018 à 18:00, Kalyan Chakravarthy a écrit : > I’m trying to migrate data between two clusters on different networks. > Ports: 7001,7199,9046,9160 are open between them. But port:7000 is not > open. When I run sstableloader command, got the following exception. > Command: > > :/a/cassandra/bin# ./sstableloader -d > 192.168.98.99/abc/cassandra/data/apps/ads-0fdd9ff0a7d711e89107ff9c3da22254 > > Error/Exception: > > Could not retrieve endpoint ranges: > org.apache.thrift.transport.TTransportException: *Frame size (352518912) > larger than max length (15728640)!* > java.lang.RuntimeException: Could not retrieve endpoint ranges: > at > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:342) > at > org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:109) > Caused by: org.apache.thrift.transport.TTransportException: Frame size > (352518912) larger than max length (15728640)! > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137) > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) > at > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1368) > at > org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1356) > at > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:304) > ... 2 more > > > > > In yaml file,’ thrift_framed_transport_size_in_mb:’ is set to 15. So I > have increased its value to 40. Even after increasing the > ‘thrift_framed_transport_size_in_mb: ‘ in yaml file, I’m getting the same > error. > > What could be the solution for this. Can somebody please help me with > this?? > > Cheers > LAD >
Exception when running sstableloader
I’m trying to migrate data between two clusters on different networks. Ports: 7001,7199,9046,9160 are open between them. But port:7000 is not open. When I run sstableloader command, got the following exception. Command: :/a/cassandra/bin# ./sstableloader -d 192.168.98.99/abc/cassandra/data/apps/ads-0fdd9ff0a7d711e89107ff9c3da22254 Error/Exception: Could not retrieve endpoint ranges: org.apache.thrift.transport.TTransportException: Frame size (352518912) larger than max length (15728640)! java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:342) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:109) Caused by: org.apache.thrift.transport.TTransportException: Frame size (352518912) larger than max length (15728640)! at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1368) at org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1356) at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:304) ... 2 more In yaml file,’ thrift_framed_transport_size_in_mb:’ is set to 15. So I have increased its value to 40. Even after increasing the ‘thrift_framed_transport_size_in_mb: ‘ in yaml file, I’m getting the same error. What could be the solution for this. Can somebody please help me with this?? Cheers LAD
Info about sstableloader
Hi, I’m new to Cassandra, please help me with sstableloader. Thank you in advance. I’m trying to migrate data between two clusters which are on different networks. Migrating data from ‘c1’ to ‘c2’ Which one will be the source and which one will be destination?? And where should I run sstableloader command?? On c1 or c2?? Cheers LAD - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: [EXTERNAL] Re: Nodetool refresh v/s sstableloader
Thank you, everyone, for responding. Rajath Subramanyam On Thu, Aug 30, 2018 at 8:38 AM Carl Mueller wrote: > - Range aware compaction strategy that subdivides data by the token range > could help for this: you only bakcup data for the primary node and not the > replica data > - yes, if you want to use nodetool refresh as some sort of recovery > solution, MAKE SURE YOU STORE THE TOKEN LIST with the > sstables/snapshots/backups for the nodes. > > On Wed, Aug 29, 2018 at 8:57 AM Durity, Sean R < > sean_r_dur...@homedepot.com> wrote: > >> Sstableloader, though, could require a lot more disk space – until >> compaction can reduce. For example, if your RF=3, you will essentially be >> loading 3 copies of the data. Then it will get replicated 3 more times as >> it is being loaded. Thus, you could need up to 9x disk space. >> >> >> >> >> >> Sean Durity >> >> *From:* kurt greaves >> *Sent:* Wednesday, August 29, 2018 7:26 AM >> *To:* User >> *Subject:* [EXTERNAL] Re: Nodetool refresh v/s sstableloader >> >> >> >> Removing dev... >> >> Nodetool refresh only picks up new SSTables that have been placed in the >> tables directory. It doesn't account for actual ownership of the data like >> SSTableloader does. Refresh will only work properly if the SSTables you are >> copying in are completely covered by that nodes tokens. It doesn't work if >> there's a change in topology, replication and token ownership will have to >> be more or less the same. >> >> >> >> SSTableloader will break up the SSTables and send the relevant bits to >> whichever node needs it, so no need for you to worry about tokens and >> copying data to the right places, it will do that for you. >> >> >> >> On 28 August 2018 at 11:27, Rajath Subramanyam >> wrote: >> >> Hi Cassandra users, Cassandra dev, >> >> >> >> When recovering using SSTables from a snapshot, I want to know what are >> the key differences between using: >> >> 1. Nodetool refresh and, >> >> 2. SSTableloader >> >> >> >> Does nodetool refresh have restrictions that need to be met? >> Does nodetool refresh work even if there is a change in the topology >> between the source cluster and the destination cluster? Does it work if the >> token ranges don't match between the source cluster and the destination >> cluster? Does it work when an old SSTable in the snapshot has a dropped >> column that is not part of the current schema? >> >> >> >> I appreciate any help in advance. >> >> >> >> Thanks, >> >> Rajath >> >> >> >> Rajath Subramanyam >> >> >> >> >> >> -- >> >> The information in this Internet Email is confidential and may be legally >> privileged. It is intended solely for the addressee. Access to this Email >> by anyone else is unauthorized. If you are not the intended recipient, any >> disclosure, copying, distribution or any action taken or omitted to be >> taken in reliance on it, is prohibited and may be unlawful. When addressed >> to our clients any opinions or advice contained in this Email are subject >> to the terms and conditions expressed in any applicable governing The Home >> Depot terms of business or client engagement letter. The Home Depot >> disclaims all responsibility and liability for the accuracy and content of >> this attachment and for any damages or losses arising from any >> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other >> items of a destructive nature, which may be contained in this attachment >> and shall not be liable for direct, indirect, consequential or special >> damages in connection with this e-mail message or its attachment. >> >
Re: [EXTERNAL] Re: Nodetool refresh v/s sstableloader
- Range aware compaction strategy that subdivides data by the token range could help for this: you only bakcup data for the primary node and not the replica data - yes, if you want to use nodetool refresh as some sort of recovery solution, MAKE SURE YOU STORE THE TOKEN LIST with the sstables/snapshots/backups for the nodes. On Wed, Aug 29, 2018 at 8:57 AM Durity, Sean R wrote: > Sstableloader, though, could require a lot more disk space – until > compaction can reduce. For example, if your RF=3, you will essentially be > loading 3 copies of the data. Then it will get replicated 3 more times as > it is being loaded. Thus, you could need up to 9x disk space. > > > > > > Sean Durity > > *From:* kurt greaves > *Sent:* Wednesday, August 29, 2018 7:26 AM > *To:* User > *Subject:* [EXTERNAL] Re: Nodetool refresh v/s sstableloader > > > > Removing dev... > > Nodetool refresh only picks up new SSTables that have been placed in the > tables directory. It doesn't account for actual ownership of the data like > SSTableloader does. Refresh will only work properly if the SSTables you are > copying in are completely covered by that nodes tokens. It doesn't work if > there's a change in topology, replication and token ownership will have to > be more or less the same. > > > > SSTableloader will break up the SSTables and send the relevant bits to > whichever node needs it, so no need for you to worry about tokens and > copying data to the right places, it will do that for you. > > > > On 28 August 2018 at 11:27, Rajath Subramanyam wrote: > > Hi Cassandra users, Cassandra dev, > > > > When recovering using SSTables from a snapshot, I want to know what are > the key differences between using: > > 1. Nodetool refresh and, > > 2. SSTableloader > > > > Does nodetool refresh have restrictions that need to be met? > Does nodetool refresh work even if there is a change in the topology > between the source cluster and the destination cluster? Does it work if the > token ranges don't match between the source cluster and the destination > cluster? Does it work when an old SSTable in the snapshot has a dropped > column that is not part of the current schema? > > > > I appreciate any help in advance. > > > > Thanks, > > Rajath > > > > Rajath Subramanyam > > > > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >
RE: [EXTERNAL] Re: Nodetool refresh v/s sstableloader
Sstableloader, though, could require a lot more disk space – until compaction can reduce. For example, if your RF=3, you will essentially be loading 3 copies of the data. Then it will get replicated 3 more times as it is being loaded. Thus, you could need up to 9x disk space. Sean Durity From: kurt greaves Sent: Wednesday, August 29, 2018 7:26 AM To: User Subject: [EXTERNAL] Re: Nodetool refresh v/s sstableloader Removing dev... Nodetool refresh only picks up new SSTables that have been placed in the tables directory. It doesn't account for actual ownership of the data like SSTableloader does. Refresh will only work properly if the SSTables you are copying in are completely covered by that nodes tokens. It doesn't work if there's a change in topology, replication and token ownership will have to be more or less the same. SSTableloader will break up the SSTables and send the relevant bits to whichever node needs it, so no need for you to worry about tokens and copying data to the right places, it will do that for you. On 28 August 2018 at 11:27, Rajath Subramanyam mailto:rajat...@gmail.com>> wrote: Hi Cassandra users, Cassandra dev, When recovering using SSTables from a snapshot, I want to know what are the key differences between using: 1. Nodetool refresh and, 2. SSTableloader Does nodetool refresh have restrictions that need to be met? Does nodetool refresh work even if there is a change in the topology between the source cluster and the destination cluster? Does it work if the token ranges don't match between the source cluster and the destination cluster? Does it work when an old SSTable in the snapshot has a dropped column that is not part of the current schema? I appreciate any help in advance. Thanks, Rajath Rajath Subramanyam The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Nodetool refresh v/s sstableloader
Removing dev... Nodetool refresh only picks up new SSTables that have been placed in the tables directory. It doesn't account for actual ownership of the data like SSTableloader does. Refresh will only work properly if the SSTables you are copying in are completely covered by that nodes tokens. It doesn't work if there's a change in topology, replication and token ownership will have to be more or less the same. SSTableloader will break up the SSTables and send the relevant bits to whichever node needs it, so no need for you to worry about tokens and copying data to the right places, it will do that for you. On 28 August 2018 at 11:27, Rajath Subramanyam wrote: > Hi Cassandra users, Cassandra dev, > > When recovering using SSTables from a snapshot, I want to know what are > the key differences between using: > 1. Nodetool refresh and, > 2. SSTableloader > > Does nodetool refresh have restrictions that need to be met? > Does nodetool refresh work even if there is a change in the topology > between the source cluster and the destination cluster? Does it work if the > token ranges don't match between the source cluster and the destination > cluster? Does it work when an old SSTable in the snapshot has a dropped > column that is not part of the current schema? > > I appreciate any help in advance. > > Thanks, > Rajath > > Rajath Subramanyam > >
Nodetool refresh v/s sstableloader
Hi Cassandra users, Cassandra dev, When recovering using SSTables from a snapshot, I want to know what are the key differences between using: 1. Nodetool refresh and, 2. SSTableloader Does nodetool refresh have restrictions that need to be met? Does nodetool refresh work even if there is a change in the topology between the source cluster and the destination cluster? Does it work if the token ranges don't match between the source cluster and the destination cluster? Does it work when an old SSTable in the snapshot has a dropped column that is not part of the current schema? I appreciate any help in advance. Thanks, Rajath Rajath Subramanyam
Re: Cassandra crashes after loading data with sstableloader
What’s the cardinality of hash? Do they have the same schema? If so you may be able to take a snapshot and hardlink it in / refresh instead of sstableloader. Alternatively you could drop the index from the destination keyspace and add it back in after the load finishes. How big are the sstables? How big is your heap? Are you already serving traffic? -- Jeff Jirsa > On Jul 29, 2018, at 3:43 PM, Rahul Singh wrote: > > What does “hash” Data look like? > > Rahul >> On Jul 24, 2018, 11:30 AM -0400, Arpan Khandelwal , >> wrote: >> I need to clone data from one keyspace to another keyspace. >> We do it by taking snapshot of keyspace1 and restoring in keyspace2 using >> sstableloader. >> >> Suppose we have following table with index on hash column. Table has around >> 10M rows. >> - >> CREATE TABLE message ( >> id uuid, >> messageid uuid, >> parentid uuid, >> label text, >> properties map, >> text1 text, >> text2 text, >> text3 text, >> category text, >> hash text, >> info map, >> creationtimestamp bigint, >> lastupdatedtimestamp bigint, >> PRIMARY KEY ( (id) ) >> ); >> >> CREATE INDEX ON message ( hash ); >> - >> Cassandra crashes when i load data using sstableloader. Load is happening >> correctly but seems that cassandra crashes when its trying to build index on >> table with huge data. >> >> I have two questions. >> 1. Is there any better way to clone keyspace? >> 2. How can i optimize sstableloader to load data and not crash cassandra >> while building index. >> >> Thanks >> Arpan
Re: Cassandra crashes after loading data with sstableloader
What does “hash” Data look like? Rahul On Jul 24, 2018, 11:30 AM -0400, Arpan Khandelwal , wrote: > I need to clone data from one keyspace to another keyspace. > We do it by taking snapshot of keyspace1 and restoring in keyspace2 using > sstableloader. > > Suppose we have following table with index on hash column. Table has around > 10M rows. > - > CREATE TABLE message ( > id uuid, > messageid uuid, > parentid uuid, > label text, > properties map, > text1 text, > text2 text, > text3 text, > category text, > hash text, > info map, > creationtimestamp bigint, > lastupdatedtimestamp bigint, > PRIMARY KEY ( (id) ) > ); > > CREATE INDEX ON message ( hash ); > - > Cassandra crashes when i load data using sstableloader. Load is happening > correctly but seems that cassandra crashes when its trying to build index on > table with huge data. > > I have two questions. > 1. Is there any better way to clone keyspace? > 2. How can i optimize sstableloader to load data and not crash cassandra > while building index. > > Thanks > Arpan
Cassandra crashes after loading data with sstableloader
I need to clone data from one keyspace to another keyspace. We do it by taking snapshot of keyspace1 and restoring in keyspace2 using sstableloader. Suppose we have following table with index on hash column. Table has around 10M rows. - CREATE TABLE message ( id uuid, messageid uuid, parentid uuid, label text, properties map, text1 text, text2 text, text3 text, category text, hash text, info map, creationtimestamp bigint, lastupdatedtimestamp bigint, PRIMARY KEY ( (id) ) ); CREATE INDEX ON message ( hash ); - Cassandra crashes when i load data using sstableloader. Load is happening correctly but seems that cassandra crashes when its trying to build index on table with huge data. I have two questions. 1. Is there any better way to clone keyspace? 2. How can i optimize sstableloader to load data and not crash cassandra while building index. Thanks Arpan
Re: sstableloader from dse 4.8.4 to apache cassandra 3.11.1
Never mind found it. its not a supported version. > On Jun 19, 2018, at 2:41 PM, rajpal reddy wrote: > > > Hello, > > I’m trying to use sstablloader from dse 4.8.4( 2.1.12) to apache 3.11.1, i’m > getting below error. but works fine when i use stableloader dse 5.1.2(apache > 3.11.0) > Could not retrieve endpoint ranges: > java.io.IOException: Failed to open transport to: host-ip:9160. > > Any work around to use the stable loader from use 4.8.4(apache 2.1.12) to > apache 3.11.1 > - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
sstableloader from dse 4.8.4 to apache cassandra 3.11.1
Hello, I’m trying to use sstablloader from dse 4.8.4( 2.1.12) to apache 3.11.1, i’m getting below error. but works fine when i use stableloader dse 5.1.2(apache 3.11.0) Could not retrieve endpoint ranges: java.io.IOException: Failed to open transport to: host-ip:9160. Any work around to use the stable loader from use 4.8.4(apache 2.1.12) to apache 3.11.1 - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: SSTableLoader Question
Sounds good. Thanks for the explanation! On Sun, Feb 18, 2018 at 5:15 PM, Rahul Singh <rahul.xavier.si...@gmail.com> wrote: > If you don’t have access to the file you don’t have access to the file. > I’ve seen this issue several times. It’s he easiest low hanging fruit to > resolve. So figure it out and make sure that it’s Cassandra.Cassandra from > root to he Data folder and either run as root or sudo it. > > If it’s compacted it won’t be there so you won’t have the file. I’m not > aware of this event being communicated to Sstableloader via SEDA. Besides, > the sstable that you are loading SHOULD not be live. If you at streaming a > life sstable, it means you are using sstableloader not as it is designed to > be used - which is with static files. > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Feb 18, 2018, 9:22 AM -0500, shalom sagges <shalomsag...@gmail.com>, > wrote: > > Not really sure with which user I ran it (root or cassandra), although I > don't understand why a permission issue will generate a File not Found > exception? > > And in general, what if a file is being streamed and got compacted before > the streaming ended. Does Cassandra know how to handle this? > > Thanks! > > On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh <rahul.xavier.si...@gmail.com > > wrote: > >> Check permissions maybe? Who owns the files vs. who is running >> sstableloader. >> >> -- >> Rahul Singh >> rahul.si...@anant.us >> >> Anant Corporation >> >> On Feb 18, 2018, 4:26 AM -0500, shalom sagges <shalomsag...@gmail.com>, >> wrote: >> >> Hi All, >> >> C* version 2.0.14. >> >> I was loading some data to another cluster using SSTableLoader. The >> streaming failed with the following error: >> >> >> Streaming error occurred >> java.lang.RuntimeException: java.io.*FileNotFoundException*: >> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file >> or directory) >> at org.apache.cassandra.io.compress.CompressedRandomAccessReade >> r.open(CompressedRandomAccessReader.java:59) >> at org.apache.cassandra.io.sstable.SSTableReader.openDataReader >> (SSTableReader.java:1409) >> at org.apache.cassandra.streaming.compress.CompressedStreamWrit >> er.write(CompressedStreamWriter.java:55) >> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$ >> 1.serialize(OutgoingFileMessage.java:59) >> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$ >> 1.serialize(OutgoingFileMessage.java:42) >> at org.apache.cassandra.streaming.messages.StreamMessage.serial >> ize(StreamMessage.java:45) >> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes >> sageHandler.sendMessage(ConnectionHandler.java:339) >> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes >> sageHandler.run(ConnectionHandler.java:311) >> at java.lang.Thread.run(Thread.java:722) >> Caused by: java.io.*FileNotFoundException*: >> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file >> or directory) >> at java.io.RandomAccessFile.open(Native Method) >> at java.io.RandomAccessFile.(RandomAccessFile.java:233) >> at org.apache.cassandra.io.util.RandomAccessReader.(Rando >> mAccessReader.java:58) >> at org.apache.cassandra.io.compress.CompressedRandomAccessReade >> r.(CompressedRandomAccessReader.java:76) >> at org.apache.cassandra.io.compress.CompressedRandomAccessReade >> r.open(CompressedRandomAccessReader.java:55) >> ... 8 more >> WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream >> failed >> >> >> >> Did I miss something when running the load? Was the file suddenly missing >> due to compaction? >> If so, did I need to disable auto compaction or stop the service >> beforehand? (didn't find any reference to compaction in the docs) >> >> I know it's an old version, but I didn't find any related bugs on "File >> not found" exceptions. >> >> Thanks! >> >> >> >
Re: SSTableLoader Question
If you don’t have access to the file you don’t have access to the file. I’ve seen this issue several times. It’s he easiest low hanging fruit to resolve. So figure it out and make sure that it’s Cassandra.Cassandra from root to he Data folder and either run as root or sudo it. If it’s compacted it won’t be there so you won’t have the file. I’m not aware of this event being communicated to Sstableloader via SEDA. Besides, the sstable that you are loading SHOULD not be live. If you at streaming a life sstable, it means you are using sstableloader not as it is designed to be used - which is with static files. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 18, 2018, 9:22 AM -0500, shalom sagges <shalomsag...@gmail.com>, wrote: > Not really sure with which user I ran it (root or cassandra), although I > don't understand why a permission issue will generate a File not Found > exception? > > And in general, what if a file is being streamed and got compacted before the > streaming ended. Does Cassandra know how to handle this? > > Thanks! > > > On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh <rahul.xavier.si...@gmail.com> > > wrote: > > > Check permissions maybe? Who owns the files vs. who is running > > > sstableloader. > > > > > > -- > > > Rahul Singh > > > rahul.si...@anant.us > > > > > > Anant Corporation > > > > > > On Feb 18, 2018, 4:26 AM -0500, shalom sagges <shalomsag...@gmail.com>, > > > wrote: > > > > Hi All, > > > > > > > > C* version 2.0.14. > > > > > > > > I was loading some data to another cluster using SSTableLoader. The > > > > streaming failed with the following error: > > > > > > > > > > > > Streaming error occurred > > > > java.lang.RuntimeException: java.io.FileNotFoundException: > > > > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file > > > > or directory) > > > > at > > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59) > > > > at > > > > org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1409) > > > > at > > > > org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:55) > > > > at > > > > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59) > > > > at > > > > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) > > > > at > > > > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) > > > > at > > > > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339) > > > > at > > > > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311) > > > > at java.lang.Thread.run(Thread.java:722) > > > > Caused by: java.io.FileNotFoundException: > > > > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file > > > > or directory) > > > > at java.io.RandomAccessFile.open(Native Method) > > > > at java.io.RandomAccessFile.(RandomAccessFile.java:233) > > > > at > > > > org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58) > > > > at > > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:76) > > > > at > > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:55) > > > > ... 8 more > > > > WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] > > > > Stream failed > > > > > > > > > > > > > > > > Did I miss something when running the load? Was the file suddenly > > > > missing due to compaction? > > > > If so, did I need to disable auto compaction or stop the service > > > > beforehand? (didn't find any reference to compaction in the docs) > > > > > > > > I know it's an old version, but I didn't find any related bugs on "File > > > > not found" exceptions. > > > > > > > > Thanks! > > > > > > > > >
Re: SSTableLoader Question
Not really sure with which user I ran it (root or cassandra), although I don't understand why a permission issue will generate a File not Found exception? And in general, what if a file is being streamed and got compacted before the streaming ended. Does Cassandra know how to handle this? Thanks! On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh <rahul.xavier.si...@gmail.com> wrote: > Check permissions maybe? Who owns the files vs. who is running > sstableloader. > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Feb 18, 2018, 4:26 AM -0500, shalom sagges <shalomsag...@gmail.com>, > wrote: > > Hi All, > > C* version 2.0.14. > > I was loading some data to another cluster using SSTableLoader. The > streaming failed with the following error: > > > Streaming error occurred > java.lang.RuntimeException: java.io.*FileNotFoundException*: > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file > or directory) > at org.apache.cassandra.io.compress.CompressedRandomAccessReade > r.open(CompressedRandomAccessReader.java:59) > at org.apache.cassandra.io.sstable.SSTableReader.openDataReader > (SSTableReader.java:1409) > at org.apache.cassandra.streaming.compress.CompressedStreamWrit > er.write(CompressedStreamWriter.java:55) > at org.apache.cassandra.streaming.messages.OutgoingFileMessage$ > 1.serialize(OutgoingFileMessage.java:59) > at org.apache.cassandra.streaming.messages.OutgoingFileMessage$ > 1.serialize(OutgoingFileMessage.java:42) > at org.apache.cassandra.streaming.messages.StreamMessage. > serialize(StreamMessage.java:45) > at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes > sageHandler.sendMessage(ConnectionHandler.java:339) > at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes > sageHandler.run(ConnectionHandler.java:311) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.io.*FileNotFoundException*: /data1/keyspace1/table1/ > keyspace1-table1-jb-65174-Data.db (No such file or directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.(RandomAccessFile.java:233) > at org.apache.cassandra.io.util.RandomAccessReader.(Rando > mAccessReader.java:58) > at org.apache.cassandra.io.compress.CompressedRandomAccessReade > r.(CompressedRandomAccessReader.java:76) > at org.apache.cassandra.io.compress.CompressedRandomAccessReade > r.open(CompressedRandomAccessReader.java:55) > ... 8 more > WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream > failed > > > > Did I miss something when running the load? Was the file suddenly missing > due to compaction? > If so, did I need to disable auto compaction or stop the service > beforehand? (didn't find any reference to compaction in the docs) > > I know it's an old version, but I didn't find any related bugs on "File > not found" exceptions. > > Thanks! > > >
Re: SSTableLoader Question
Check permissions maybe? Who owns the files vs. who is running sstableloader. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 18, 2018, 4:26 AM -0500, shalom sagges <shalomsag...@gmail.com>, wrote: > Hi All, > > C* version 2.0.14. > > I was loading some data to another cluster using SSTableLoader. The streaming > failed with the following error: > > > Streaming error occurred > java.lang.RuntimeException: java.io.FileNotFoundException: > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or > directory) > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59) > at > org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1409) > at > org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:55) > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59) > at > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42) > at > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45) > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339) > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311) > at java.lang.Thread.run(Thread.java:722) > Caused by: java.io.FileNotFoundException: > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or > directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.(RandomAccessFile.java:233) > at > org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58) > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:76) > at > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:55) > ... 8 more > WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream > failed > > > > Did I miss something when running the load? Was the file suddenly missing due > to compaction? > If so, did I need to disable auto compaction or stop the service beforehand? > (didn't find any reference to compaction in the docs) > > I know it's an old version, but I didn't find any related bugs on "File not > found" exceptions. > > Thanks! > >
SSTableLoader Question
Hi All, C* version 2.0.14. I was loading some data to another cluster using SSTableLoader. The streaming failed with the following error: Streaming error occurred java.lang.RuntimeException: java.io.*FileNotFoundException*: /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or directory) at org.apache.cassandra.io.compress.CompressedRandomAccessReader. open(CompressedRandomAccessReader.java:59) at org.apache.cassandra.io.sstable.SSTableReader. openDataReader(SSTableReader.java:1409) at org.apache.cassandra.streaming.compress. CompressedStreamWriter.write(CompressedStreamWriter.java:55) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1. serialize(OutgoingFileMessage.java:59) at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1. serialize(OutgoingFileMessage.java:42) at org.apache.cassandra.streaming.messages.StreamMessage.serialize( StreamMessage.java:45) at org.apache.cassandra.streaming.ConnectionHandler$ OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339) at org.apache.cassandra.streaming.ConnectionHandler$ OutgoingMessageHandler.run(ConnectionHandler.java:311) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.*FileNotFoundException*: /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.cassandra.io.util.RandomAccessReader.( RandomAccessReader.java:58) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.< init>(CompressedRandomAccessReader.java:76) at org.apache.cassandra.io.compress.CompressedRandomAccessReader. open(CompressedRandomAccessReader.java:55) ... 8 more WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream failed Did I miss something when running the load? Was the file suddenly missing due to compaction? If so, did I need to disable auto compaction or stop the service beforehand? (didn't find any reference to compaction in the docs) I know it's an old version, but I didn't find any related bugs on "File not found" exceptions. Thanks!
Trouble restoring with sstableloader
Hi all, I've been running into the following issue while trying to restore a C* database via sstableloader: Could not retrieve endpoint ranges: org.apache.thrift.transport.TTransportException: Frame size (352518912) larger than max length (15728640)! java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:283) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95) Caused by: org.apache.thrift.transport.TTransportException: Frame size (352518912) larger than max length (15728640)! at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1327) at org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1315) at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:256) ... 2 more This seems odd since the frame size thrift is asking for is over 336 MB. This is happening using Cassandra 2.0.12 | Thrift protocol 19.39.0 Any advice? Thanks! --Jim
sstableloader out of memory
Hi all, We're trying to load a snapshot back into a cluster, but are running into memory issues. We've got about 190GB of data across 11 sstable-generations. Some of the smaller ones load, but the larger ones aren't. We've tried increasing the max-heap-size to 16G, but stil see this exception: sstableloader -d cass1 /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372 Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19968-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19930-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19966-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19960-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19944-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-9639-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19964-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-18879-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19965-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19967-Data.db /snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19959-Data.db to [] Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:257) at org.apache.cassandra.streaming.messages.OutgoingFileMessage.(OutgoingFileMessage.java:70) at org.apache.cassandra.streaming.StreamTransferTask.addTransferFile(StreamTransferTask.java:58) at org.apache.cassandra.streaming.StreamSession.addTransferFiles(StreamSession.java:378) at org.apache.cassandra.streaming.StreamCoordinator.transferFiles(StreamCoordinator.java:147) at org.apache.cassandra.streaming.StreamPlan.transferFiles(StreamPlan.java:144) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:185) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106) Has anyone run into this before? The next steps we're going to try are running sstableloader on each generation individually (suspecting that it's trying to open all 11 generations at the same time). If that doesn't work we'll try sstablesplit, but aren't that confident that would do anything since it probably uses the same code to read the sstables as sstableloader and also run out of memory. Thanks, Nathan
sstableloader limitations in multi-dc cluster
I'm trying to use sstableloader to bulk load some data to my 4 DC cluster, and I can't quite get it to work. Here is how I'm trying to run it: sstableloader -d 127.0.0.1 -i {csv list of private ips of nodes in cluster} myks/mttest At first this seems to work, with a steady stream of logging like this (eventually getting to 100%): progress: [/10.0.1.225]0:13/13 100% [/10.0.0.134]0:13/13 100% [/10.0.0.119]0:13/13 100% [/10.0.1.26]0:13/13 100% [/10.0.3.188]0:13/13 100% [/10.0.3.189]0:13/13 100% [/10.0.2.95]0:13/13 100% total: 100% 0.000KiB/s (avg: 13.857MiB/s) There will be some errors sprinkled in like this: ERROR 15:35:43 [Stream #707f0920-5760-11e7-8ede-37de75ac1efa] Streaming error occurred on session with peer 10.0.2.9 java.net.NoRouteToHostException: No route to host Then, at the end, there will be one last warning about the failed streams: WARN 15:38:03 [Stream #707f0920-5760-11e7-8ede-37de75ac1efa] Stream failed Streaming to the following hosts failed: [/127.0.0.1, {list of same private ips as above}] I am perplexed about the failures because I am trying to explicitly ignore the nodes in remote DC's via the -i option to sstableloader. Why doesn't this work? I've tried using the public IP's instead just for kicks, but that doesn't change anything. I don't see anything helpful in the cassandra logs (including debug logs). Also, why is localhost in the list of failures? I can query the data locally after the sstableloader command completes. I've also noticed that sstableloader fails completely (even locally) while I am decomissioning or bootstrapping a node in a remote DC. Is this a limitation of sstableloader? I haven't been able to find documentation about this.
Re: sstableloader making no progress
Adding to the above, each host shows the following log messages that, despite being at INFO level, appear like stack traces to me: 2017-02-13 15:09:22,166 INFO [STREAM-INIT-/10.128.X.Y:60306] StreamResultFuture.java:116 - [Stream #afe548d0-f230-11e6-bc5d-8f99f25bfcf7, ID#0] Received streaming plan for Bulk Load at clojure.lang.Var.invoke(Var.java:401) at opsagent.config_service$update_system$fn__20140.invoke(config_service.clj:205) at clojure.core$reduce.invoke(core.clj:6518) at clojure.lang.RestFn.invoke(RestFn.java:425) at opsagent.config_service$fn__20217$fn__20218$state_machine__4128__auto20219$fn__20221.invoke(config_service.clj:250) at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:940) at clojure.core.async$ioc_alts_BANG_$fn__4293.invoke(async.clj:362) at clojure.core.async.impl.channels.ManyToManyChannel$fn__624.invoke(channels.clj:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.lang.Thread.run(Thread.java:745) 2017-02-13 15:09:22,208 INFO [STREAM-IN-/10.128.X.Y] StreamResultFuture.java:166 - [Stream #afe548d0-f230-11e6-bc5d-8f99f25bfcf7 ID#0] Prepare completed. Receiving 3 files(3963 bytes), sending 0 files(0 bytes) at clojure.lang.ArraySeq.reduce(ArraySeq.java:114) at opsagent.config_service$update_system.doInvoke(config_service.clj:199) at opsagent.config_service$start_system_BANG_.invoke(config_service.clj:224) at opsagent.config_service$fn__20217$fn__20218$state_machine__4128__auto20219.invoke(config_service.clj:247) at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:944) at clojure.core.async$do_alts$fn__4247$fn__4250.invoke(async.clj:231) at clojure.lang.AFn.run(AFn.java:22) Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini On Fri, Feb 10, 2017 at 4:28 PM, Simone Franzini <captainfr...@gmail.com> wrote: > I am trying to ingest some data from a cluster to a different cluster via > sstableloader. I am running DSE 4.8.7 / Cassandra 2.1.14. > I have re-created the schemas and followed other instructions here: > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsBulkloader_t.html > > I am initially testing the ingest process with a single table, containing > 3 really small sstables (just a few KB each): > sstableloader -v -d / > From the console, it appears that the progress quickly reaches 100%, but > the command never returns: > progress: [/10.128.X.Y]0:3/3 100% [/10.192.Z.W]0:3/3 100% ... total: 100% > 0 MB/s(avg: 0 MB/s) > > nodetool netstats shows that there is no progress: > Mode: NORMAL > Bulk Load e495cea0-efde-11e6-9ec0-8f99f25bfcf7 > /10.128.X.Y > Receiving 3 files, 3963 bytes total. Already received 0 files, 0 > bytes total > Bulk Load b2566980-efb7-11e6-a467-8f99f25bfcf7 > /10.128.X.Y > Receiving 3 files, 3963 bytes total. Already received 0 files, 0 > bytes total > Bulk Load f31e7810-efdd-11e6-8484-8f99f25bfcf7 > /10.128.X.Y > Receiving 3 files, 3963 bytes total. Already received 0 files, 0 > bytes total > ... > Read Repair Statistics: > Attempted: 8 > Mismatch (Blocking): 0 > Mismatch (Background): 0 > Pool NameActive Pending Completed > Commandsn/a 02148112 > Responses n/a 0 977176 > > > The logs show the following, but no error or warning message: > 2017-02-10 16:18:49,096 INFO [STREAM-INIT-/10.128.X.Y:33302] > StreamResultFuture.java:109 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7 > ID#0] Creating new streaming plan for Bulk Load > 2017-02-10 16:18:49,105 INFO [STREAM-INIT-/10.128.X.Y:33302] > StreamResultFuture.java:116 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7, > ID#0] Received streaming plan for Bulk Load > 2017-02-10 16:18:49,110 INFO [STREAM-INIT-/10.128.X.Y:33306] > StreamResultFuture.java:116 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7, > ID#0] Received streaming plan for Bulk Load > 2017-02-10 16:18:49,110 INFO [STREAM-IN-/10.128.X.Y] > StreamResultFuture.java:166 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7 > ID#0] Prepare completed. Receiving 3 files(3963 bytes), sending 0 files(0 > bytes) > > > Any help would be greatly appreciated. > > Simone Franzini, PhD > > http://www.linkedin.com/in/simonefranzini >
sstableloader making no progress
I am trying to ingest some data from a cluster to a different cluster via sstableloader. I am running DSE 4.8.7 / Cassandra 2.1.14. I have re-created the schemas and followed other instructions here: https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html I am initially testing the ingest process with a single table, containing 3 really small sstables (just a few KB each): sstableloader -v -d / >From the console, it appears that the progress quickly reaches 100%, but the command never returns: progress: [/10.128.X.Y]0:3/3 100% [/10.192.Z.W]0:3/3 100% ... total: 100% 0 MB/s(avg: 0 MB/s) nodetool netstats shows that there is no progress: Mode: NORMAL Bulk Load e495cea0-efde-11e6-9ec0-8f99f25bfcf7 /10.128.X.Y Receiving 3 files, 3963 bytes total. Already received 0 files, 0 bytes total Bulk Load b2566980-efb7-11e6-a467-8f99f25bfcf7 /10.128.X.Y Receiving 3 files, 3963 bytes total. Already received 0 files, 0 bytes total Bulk Load f31e7810-efdd-11e6-8484-8f99f25bfcf7 /10.128.X.Y Receiving 3 files, 3963 bytes total. Already received 0 files, 0 bytes total ... Read Repair Statistics: Attempted: 8 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool NameActive Pending Completed Commandsn/a 02148112 Responses n/a 0 977176 The logs show the following, but no error or warning message: 2017-02-10 16:18:49,096 INFO [STREAM-INIT-/10.128.X.Y:33302] StreamResultFuture.java:109 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7 ID#0] Creating new streaming plan for Bulk Load 2017-02-10 16:18:49,105 INFO [STREAM-INIT-/10.128.X.Y:33302] StreamResultFuture.java:116 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7, ID#0] Received streaming plan for Bulk Load 2017-02-10 16:18:49,110 INFO [STREAM-INIT-/10.128.X.Y:33306] StreamResultFuture.java:116 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7, ID#0] Received streaming plan for Bulk Load 2017-02-10 16:18:49,110 INFO [STREAM-IN-/10.128.X.Y] StreamResultFuture.java:166 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7 ID#0] Prepare completed. Receiving 3 files(3963 bytes), sending 0 files(0 bytes) Any help would be greatly appreciated. Simone Franzini, PhD http://www.linkedin.com/in/simonefranzini
Re: [Marketing Mail] Re: [Marketing Mail] Re: sstableloader question
Hello, It's about 2500 sstables worth 25TB of data. -t parameter doesn't change -t 1000 and -t 1 Most probably I face some limitation at target cluster. I'm preparing to split sstables and run up to ten parallel sstableloader sessions. Regards, Osman On 11-10-2016 21:46, Rajath Subramanyam wrote: How many sstables are you trying to load ? Running sstableloaders in parallel will help. Did you try setting the "-t" parameter and see if you are getting the expected throughput ? - Rajath Rajath Subramanyam On Mon, Oct 10, 2016 at 2:02 PM, Osman YOZGATLIOGLU <osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com>> wrote: Hello, Thank you Adam and Rajath. I'll split input sstables and run parallel jobs for each. I tested this approach and run 3 parallel sstableloader job without -t parameter. I raised stream_throughput_outbound_megabits_per_sec parameter from 200 to 600 Mbit/sec at all of target nodes. But each job runs about 10MB/sec only and generates about 100Mbit'sec network traffic. At total this can be much more. Source and target servers has plenty of unused cpu, io and network resource. Do you have any idea how can I increase speed of sstableloader job? Regards, Osman On 10-10-2016 22:05, Rajath Subramanyam wrote: Hi Osman, You cannot restart the streaming only to the failed nodes specifically. You can restart the sstableloader job itself. Compaction will eventually take care of the redundant rows. - Rajath Rajath Subramanyam On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson <a...@datascale.io<mailto:a...@datascale.io><mailto:a...@datascale.io<mailto:a...@datascale.io>>> wrote: It'll start over from the beginning. On Sunday, October 9, 2016, Osman YOZGATLIOGLU <osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com><mailto:osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com>>> wrote: Hello, I have running a sstableloader job. Unfortunately some of nodes restarted since beginnig streaming. I see streaming stop for those nodes. Can I restart those streaming somehow? Or if I restart sstableloader job, will it start from beginning? Regards, Osman This e-mail message, including any attachments, is for the sole use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. KRON makes no warranty that this e-mail is error or virus free. -- Adam Hutson Data Architect | DataScale +1 (417) 224-5212<tel:%2B1%20%28417%29%20224-5212><tel:%2B1%20%28417%29%20224-5212> a...@datascale.io<mailto:a...@datascale.io><mailto:a...@datascale.io<mailto:a...@datascale.io>> This e-mail message, including any attachments, is for the sole use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. KRON makes no warranty that this e-mail is error or virus free. This e-mail message, including any attachments, is for the sole use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. KRON makes no warranty that this e-mail is error or virus free.
Re: [Marketing Mail] Re: sstableloader question
How many sstables are you trying to load ? Running sstableloaders in parallel will help. Did you try setting the "-t" parameter and see if you are getting the expected throughput ? - Rajath Rajath Subramanyam On Mon, Oct 10, 2016 at 2:02 PM, Osman YOZGATLIOGLU < osman.yozgatlio...@krontech.com> wrote: > Hello, > > Thank you Adam and Rajath. > > I'll split input sstables and run parallel jobs for each. > I tested this approach and run 3 parallel sstableloader job without -t > parameter. > I raised stream_throughput_outbound_megabits_per_sec parameter from 200 > to 600 Mbit/sec at all of target nodes. > But each job runs about 10MB/sec only and generates about 100Mbit'sec > network traffic. > At total this can be much more. Source and target servers has plenty of > unused cpu, io and network resource. > Do you have any idea how can I increase speed of sstableloader job? > > Regards, > Osman > > On 10-10-2016 22:05, Rajath Subramanyam wrote: > Hi Osman, > > You cannot restart the streaming only to the failed nodes specifically. > You can restart the sstableloader job itself. Compaction will eventually > take care of the redundant rows. > > - Rajath > > > Rajath Subramanyam > > > On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson <a...@datascale.io<mailto:adam > @datascale.io>> wrote: > It'll start over from the beginning. > > > On Sunday, October 9, 2016, Osman YOZGATLIOGLU < > osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com>> > wrote: > Hello, > > I have running a sstableloader job. > Unfortunately some of nodes restarted since beginnig streaming. > I see streaming stop for those nodes. > Can I restart those streaming somehow? > Or if I restart sstableloader job, will it start from beginning? > > Regards, > Osman > > > This e-mail message, including any attachments, is for the sole use of the > person to whom it has been sent, and may contain information that is > confidential or legally protected. If you are not the intended recipient or > have received this message in error, you are not authorized to copy, > distribute, or otherwise use this message or its attachments. Please notify > the sender immediately by return e-mail and permanently delete this message > and any attachments. KRON makes no warranty that this e-mail is error or > virus free. > > > -- > > Adam Hutson > Data Architect | DataScale > +1 (417) 224-5212<tel:%2B1%20%28417%29%20224-5212> > a...@datascale.io<mailto:a...@datascale.io> > > > > > This e-mail message, including any attachments, is for the sole use of the > person to whom it has been sent, and may contain information that is > confidential or legally protected. If you are not the intended recipient or > have received this message in error, you are not authorized to copy, > distribute, or otherwise use this message or its attachments. Please notify > the sender immediately by return e-mail and permanently delete this message > and any attachments. KRON makes no warranty that this e-mail is error or > virus free. >
Re: [Marketing Mail] Re: sstableloader question
Hello, Thank you Adam and Rajath. I'll split input sstables and run parallel jobs for each. I tested this approach and run 3 parallel sstableloader job without -t parameter. I raised stream_throughput_outbound_megabits_per_sec parameter from 200 to 600 Mbit/sec at all of target nodes. But each job runs about 10MB/sec only and generates about 100Mbit'sec network traffic. At total this can be much more. Source and target servers has plenty of unused cpu, io and network resource. Do you have any idea how can I increase speed of sstableloader job? Regards, Osman On 10-10-2016 22:05, Rajath Subramanyam wrote: Hi Osman, You cannot restart the streaming only to the failed nodes specifically. You can restart the sstableloader job itself. Compaction will eventually take care of the redundant rows. - Rajath Rajath Subramanyam On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson <a...@datascale.io<mailto:a...@datascale.io>> wrote: It'll start over from the beginning. On Sunday, October 9, 2016, Osman YOZGATLIOGLU <osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com>> wrote: Hello, I have running a sstableloader job. Unfortunately some of nodes restarted since beginnig streaming. I see streaming stop for those nodes. Can I restart those streaming somehow? Or if I restart sstableloader job, will it start from beginning? Regards, Osman This e-mail message, including any attachments, is for the sole use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. KRON makes no warranty that this e-mail is error or virus free. -- Adam Hutson Data Architect | DataScale +1 (417) 224-5212<tel:%2B1%20%28417%29%20224-5212> a...@datascale.io<mailto:a...@datascale.io> This e-mail message, including any attachments, is for the sole use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. KRON makes no warranty that this e-mail is error or virus free.
Re: sstableloader question
Hi Osman, You cannot restart the streaming only to the failed nodes specifically. You can restart the sstableloader job itself. Compaction will eventually take care of the redundant rows. - Rajath Rajath Subramanyam On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson <a...@datascale.io> wrote: > It'll start over from the beginning. > > > On Sunday, October 9, 2016, Osman YOZGATLIOGLU < > osman.yozgatlio...@krontech.com> wrote: > >> Hello, >> >> I have running a sstableloader job. >> Unfortunately some of nodes restarted since beginnig streaming. >> I see streaming stop for those nodes. >> Can I restart those streaming somehow? >> Or if I restart sstableloader job, will it start from beginning? >> >> Regards, >> Osman >> >> >> This e-mail message, including any attachments, is for the sole use of >> the person to whom it has been sent, and may contain information that is >> confidential or legally protected. If you are not the intended recipient or >> have received this message in error, you are not authorized to copy, >> distribute, or otherwise use this message or its attachments. Please notify >> the sender immediately by return e-mail and permanently delete this message >> and any attachments. KRON makes no warranty that this e-mail is error or >> virus free. >> > > > -- > > Adam Hutson > Data Architect | DataScale > +1 (417) 224-5212 > a...@datascale.io >
Re: sstableloader question
It'll start over from the beginning. On Sunday, October 9, 2016, Osman YOZGATLIOGLU < osman.yozgatlio...@krontech.com> wrote: > Hello, > > I have running a sstableloader job. > Unfortunately some of nodes restarted since beginnig streaming. > I see streaming stop for those nodes. > Can I restart those streaming somehow? > Or if I restart sstableloader job, will it start from beginning? > > Regards, > Osman > > > This e-mail message, including any attachments, is for the sole use of the > person to whom it has been sent, and may contain information that is > confidential or legally protected. If you are not the intended recipient or > have received this message in error, you are not authorized to copy, > distribute, or otherwise use this message or its attachments. Please notify > the sender immediately by return e-mail and permanently delete this message > and any attachments. KRON makes no warranty that this e-mail is error or > virus free. > -- Adam Hutson Data Architect | DataScale +1 (417) 224-5212 a...@datascale.io
sstableloader question
Hello, I have running a sstableloader job. Unfortunately some of nodes restarted since beginnig streaming. I see streaming stop for those nodes. Can I restart those streaming somehow? Or if I restart sstableloader job, will it start from beginning? Regards, Osman This e-mail message, including any attachments, is for the sole use of the person to whom it has been sent, and may contain information that is confidential or legally protected. If you are not the intended recipient or have received this message in error, you are not authorized to copy, distribute, or otherwise use this message or its attachments. Please notify the sender immediately by return e-mail and permanently delete this message and any attachments. KRON makes no warranty that this e-mail is error or virus free.
Re: sstableloader
Thank you for your answer Kai. On 17 Aug 2016, at 11:34 , Kai Wang <dep...@gmail.com<mailto:dep...@gmail.com>> wrote: yes, you are correct. On Tue, Aug 16, 2016 at 2:37 PM, Jean Tremblay <jean.tremb...@zen-innovations.com<mailto:jean.tremb...@zen-innovations.com>> wrote: Hi, I’m using Cassandra 3.7. In the documentation for sstableloader I read the following: << Note: To get the best throughput from SSTable loading, you can use multiple instances of sstableloader to stream across multiple machines. No hard limit exists on the number of SSTables that sstableloader can run at the same time, so you can add additional loaders until you see no further improvement.>> Does this mean that I can stream my sstables to my cluster from many instance of sstableloader running simultaneously on many client machines? I ask because I would like to improve the transfer speed of my stables to my cluster. Kind regards and thanks for your comments. Jean
Re: sstableloader
yes, you are correct. On Tue, Aug 16, 2016 at 2:37 PM, Jean Tremblay < jean.tremb...@zen-innovations.com> wrote: > Hi, > > I’m using Cassandra 3.7. > > In the documentation for sstableloader I read the following: > > << Note: To get the best throughput from SSTable loading, you can use > multiple instances of sstableloader to stream across multiple machines. No > hard limit exists on the number of SSTables that sstableloader can run at > the same time, so you can add additional loaders until you see no further > improvement.>> > > Does this mean that I can stream my sstables to my cluster from many > instance of sstableloader running simultaneously on many client machines? > > I ask because I would like to improve the transfer speed of my stables to > my cluster. > > Kind regards and thanks for your comments. > > Jean >
Re: Restoring Incremental Backups without using sstableloader
Hi, Well you can do it through copy / past all the sstable as written in the link you gave as long as your token ranges distribution did not change since you took the snapshots and that you have a way to be sure what node each sstable belongs. Make sure that snapshot taken to node X indeed go back to node X. If you do not have information on where the sstable comes from or if you added / removed nodes, then using the sstableloader is probably a good idea. If you really don't like sstableloader (not sure why), you can paste all the sstables to all the nodes then nodetool refresh + nodetool cleanup. But in most cases, all the data won't fit in one node, plus you might have sstable names identical you'll have to handle. Hope that helps, C*heers, --- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-05-17 11:14 GMT+01:00 Ravi Teja A V <avt...@gmail.com>: > Hi everyone > > I am currently working with Cassandra 3.5. I would like to know if it is > possible to restore backups without using sstableloader. I have been > referring to the following pages in the datastax documentation: > > https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsBackupSnapshotRestore.html > Thank you. > > Yours sincerely > RAVI TEJA A V >
Does sstableloader still use gossip?
Hi, in the docs it still says that the sstableloader still uses gossip ( https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html http://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsBulkloader.html ) but this blog ( http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated) posts says „sstableloader no longer participates in gossip membership to get schema and ring information.“ While the blog post makes totally seeds I wonder why its still in the docs. Is a correctly configured cassandra.yaml this necessary to use the sstableloader or are the hosts specified with the -d option enough? Thanks -- Matthias Niehoff | IT-Consultant | Agile Software Factory | Consulting codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0) 172.1702676 www.codecentric.de | blog.codecentric.de | www.meettheexperts.de | www.more4fi.de Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet
Re: sstableloader: Stream failed
Thanks for the hint! Indeed I could not telnet to the host. It was the listen_address that was not properly configured. Thanks again! Ralf > On 23.05.2016, at 21:01, Paulo Motta <pauloricard...@gmail.com> wrote: > > Can you telnet 10.211.55.8 7000? This is the port used for streaming > communication with the destination node. > > If not you should check what is the configured storage_port in the > destination node and set that in the cassandra.yaml of the source node so > it's picked up by sstableloader. >
Re: sstableloader: Stream failed
Can you telnet 10.211.55.8 7000? This is the port used for streaming communication with the destination node. If not you should check what is the configured storage_port in the destination node and set that in the cassandra.yaml of the source node so it's picked up by sstableloader. 2016-05-23 10:48 GMT-03:00 Ralf Steppacher <ralf.viva...@gmail.com>: > Hello, > > I am trying to load the SSTables (from a Titan graph keyspace) of a > one-node-cluster (C* v2.2.6) into another node, but I cannot figure out how > to properly use the sstableloader. The target keyspace and table exist in > the target node. If they do not exist I get a proper error message telling > me so. > Providing a cassandra.yaml or not makes no difference. > The listen_address and rpc_address values in the cassandra.yaml, if > provided, do not seem to matter (at least the error is always the same). > Running sstableloader on the C* node itself or another host makes no > difference. > Truncating all tables before attempting to load the date makes no > difference. > > The node is up and running: > INFO 13:41:18 Starting listening for CQL clients on /10.211.55.8:9042... > INFO 13:41:18 Binding thrift service to /10.211.55.8:9160 > INFO 13:41:18 Listening for thrift clients... > > > The error I am getting is this: > > $ ./sstableloader -d 10.211.55.8 -f ../conf/cassandra.yaml -v ~/Downloads/ > > ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/ > objc[18941]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/bin/java > and > /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre/lib/libinstrument.dylib. > One of the two will be used. Which one is undefined. > Established connection to initial hosts > Opening sstables and calculating sections to stream > Streaming relevant part of > /Users/rsteppac/Downloads/ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/la-1-big-Data.db > to [/10.211.55.8] > ERROR 12:57:24 [Stream #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Streaming > error occurred > java.net.ConnectException: Connection refused > at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_77] > at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_77] > at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_77] > at > sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > ~[na:1.8.0_77] > at java.nio.channels.SocketChannel.open(SocketChannel.java:189) > ~[na:1.8.0_77] > at > org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:248) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:83) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:235) > ~[apache-cassandra-2.2.6.jar:2.2.6] > at > org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212) > [apache-cassandra-2.2.6.jar:2.2.6] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_77] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_77] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] > progress: total: 100% 0 MB/s(avg: 0 MB/s)WARN 12:57:24 [Stream > #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Stream failed > Streaming to the following hosts failed: > [/10.211.55.8] > java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:115) > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > at > com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) > at > com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) > at > com.
sstableloader: Stream failed
Hello, I am trying to load the SSTables (from a Titan graph keyspace) of a one-node-cluster (C* v2.2.6) into another node, but I cannot figure out how to properly use the sstableloader. The target keyspace and table exist in the target node. If they do not exist I get a proper error message telling me so. Providing a cassandra.yaml or not makes no difference. The listen_address and rpc_address values in the cassandra.yaml, if provided, do not seem to matter (at least the error is always the same). Running sstableloader on the C* node itself or another host makes no difference. Truncating all tables before attempting to load the date makes no difference. The node is up and running: INFO 13:41:18 Starting listening for CQL clients on /10.211.55.8:9042... INFO 13:41:18 Binding thrift service to /10.211.55.8:9160 INFO 13:41:18 Listening for thrift clients... The error I am getting is this: $ ./sstableloader -d 10.211.55.8 -f ../conf/cassandra.yaml -v ~/Downloads/ ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/ objc[18941]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/bin/java and /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre/lib/libinstrument.dylib. One of the two will be used. Which one is undefined. Established connection to initial hosts Opening sstables and calculating sections to stream Streaming relevant part of /Users/rsteppac/Downloads/ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/la-1-big-Data.db to [/10.211.55.8] ERROR 12:57:24 [Stream #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Streaming error occurred java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_77] at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_77] at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_77] at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_77] at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_77] at org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:248) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:83) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:235) ~[apache-cassandra-2.2.6.jar:2.2.6] at org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212) [apache-cassandra-2.2.6.jar:2.2.6] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77] progress: total: 100% 0 MB/s(avg: 0 MB/s)WARN 12:57:24 [Stream #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Stream failed Streaming to the following hosts failed: [/10.211.55.8] java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:115) Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210) at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186) at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:434) at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:529) at org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:241
Restoring Incremental Backups without using sstableloader
Hi everyone I am currently working with Cassandra 3.5. I would like to know if it is possible to restore backups without using sstableloader. I have been referring to the following pages in the datastax documentation: https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsBackupSnapshotRestore.html Thank you. Yours sincerely RAVI TEJA A V
Re: sstableloader throughput
On Mon, Jan 11, 2016 at 10:25 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > > Make sure streaming throughput isn’t throttled on the destination cluster. > How do I do that? Is stream_throughput_outbound_megabits_per_sec the attribute in cassandra.yaml. I think we can set that on the fly using nodetool setstreamthroughput I ran nodetool setstreamthroughput 0 on target machine. But that doesn't improve the average througput. Thanks and Regards Noorul > Stream from more machines (divide sstables between a bunch of machines, run > in parallel). > > > > > > > > On 1/11/16, 5:21 AM, "Noorul Islam K M" <noo...@noorul.com> wrote: > >> >>I have a need to stream data to new cluster using sstableloader. I >>spawned a machine with 32 cores assuming that sstableloader scaled with >>respect to cores. But it doesn't look like so. >> >>I am getting an average throughput of 18 MB/s which seems to be pretty >>low (I might be wrong). >> >>Is there any way to increase the throughput. OpsCenter data on target >>cluster shows very less write requests / second. >> >>Thanks and Regards >>Noorul
sstableloader throughput
I have a need to stream data to new cluster using sstableloader. I spawned a machine with 32 cores assuming that sstableloader scaled with respect to cores. But it doesn't look like so. I am getting an average throughput of 18 MB/s which seems to be pretty low (I might be wrong). Is there any way to increase the throughput. OpsCenter data on target cluster shows very less write requests / second. Thanks and Regards Noorul
Re: sstableloader throughput
Make sure streaming throughput isn’t throttled on the destination cluster. Stream from more machines (divide sstables between a bunch of machines, run in parallel). On 1/11/16, 5:21 AM, "Noorul Islam K M" <noo...@noorul.com> wrote: > >I have a need to stream data to new cluster using sstableloader. I >spawned a machine with 32 cores assuming that sstableloader scaled with >respect to cores. But it doesn't look like so. > >I am getting an average throughput of 18 MB/s which seems to be pretty >low (I might be wrong). > >Is there any way to increase the throughput. OpsCenter data on target >cluster shows very less write requests / second. > >Thanks and Regards >Noorul smime.p7s Description: S/MIME cryptographic signature
Re: why I got error "Could not retrieve endpoint rangs" when I run sstableloader?
You only need patch for sstableloader. You don't have to upgrade your cassandra servers at all. So, 1. fetch the latest cassandra-2.1 source $ git clone https://git-wip-us.apache.org/repos/asf/cassandra.git $ cd cassandra $ git checkout origin/cassandra-2.1 2. build it $ ant 3. use sstableloader you just built $ bin/sstableloader On Mon, Dec 28, 2015 at 6:03 PM, 土卜皿 <pengcz.n...@gmail.com> wrote: > hi, Yuki >Thank you very much! > The issue's description almost fits to my case! > 1. My Cassandra version is 2.1.11 > 2. my table has several colomn with collection type > 3. Before failed this time, I can use sstableloader to load the data > into this table, but > I got this error after I drop one column with collection type and > insert a column with int type > Do you think I will resolve my question if I update the version into > 2.1.13? > > And, my table already had 560 millions of records. So, for resolving this, > Whether I only need to update the new version C*.jar > and restart cassandra? > > Dillon > > 2015-12-29 7:36 GMT+08:00 Yuki Morishita <mor.y...@gmail.com>: >> >> This is known issue. >> >> https://issues.apache.org/jira/browse/CASSANDRA-10700 >> >> It is fixed in not-yet-released version 2.1.13. >> So, you need to build from the latest cassandra-2.1 branch to try. >> >> >> On Mon, Dec 28, 2015 at 5:28 PM, 土卜皿 <pengcz.n...@gmail.com> wrote: >> > hi, all >> > I used the sstableloader many times successfully, but I got the >> > following >> > error: >> > >> > [root@localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user >> > -pw >> > password -v -d 172.21.0.131 ./currentdata/keyspace/table >> > >> > Could not retrieve endpoint ranges: >> > java.lang.IllegalArgumentException >> > java.lang.RuntimeException: Could not retrieve endpoint ranges: >> > at >> > >> > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338) >> > at >> > >> > org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156) >> > at >> > org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106) >> > Caused by: java.lang.IllegalArgumentException >> > at java.nio.Buffer.limit(Buffer.java:267) >> > at >> > >> > org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543) >> > at >> > >> > org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124) >> > at >> > >> > org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101) >> > at >> > >> > org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30) >> > at >> > >> > org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50) >> > at >> > >> > org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68) >> > at >> > >> > org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287) >> > at >> > >> > org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833) >> > at >> > >> > org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126) >> > at >> > >> > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330) >> > ... 2 more >> > >> > I don't know whether this error is relative to one of cluster nodes' >> > linux >> > crash? >> > >> > Any advice will be appreciated! >> > >> > Dillon Peng >> >> >> >> -- >> Yuki Morishita >> t:yukim (http://twitter.com/yukim) > > -- Yuki Morishita t:yukim (http://twitter.com/yukim)
why I got error "Could not retrieve endpoint rangs" when I run sstableloader?
hi, all I used the sstableloader many times successfully, but I got the following error: [root@localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user -pw password -v -d 172.21.0.131 ./currentdata/keyspace/table Could not retrieve endpoint ranges: java.lang.IllegalArgumentException java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156) at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106) Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.limit(Buffer.java:267) at org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543) at org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124) at org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101) at org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30) at org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50) at org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68) at org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287) at org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833) at org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126) at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330) ... 2 more I don't know whether this error is relative to one of cluster nodes' linux crash? Any advice will be appreciated! Dillon Peng
Re: why I got error "Could not retrieve endpoint rangs" when I run sstableloader?
This is known issue. https://issues.apache.org/jira/browse/CASSANDRA-10700 It is fixed in not-yet-released version 2.1.13. So, you need to build from the latest cassandra-2.1 branch to try. On Mon, Dec 28, 2015 at 5:28 PM, 土卜皿 <pengcz.n...@gmail.com> wrote: > hi, all > I used the sstableloader many times successfully, but I got the following > error: > > [root@localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user -pw > password -v -d 172.21.0.131 ./currentdata/keyspace/table > > Could not retrieve endpoint ranges: > java.lang.IllegalArgumentException > java.lang.RuntimeException: Could not retrieve endpoint ranges: > at > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338) > at > org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156) > at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106) > Caused by: java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Buffer.java:267) > at > org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543) > at > org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124) > at > org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101) > at > org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30) > at > org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50) > at > org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68) > at > org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287) > at > org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833) > at > org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126) > at > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330) > ... 2 more > > I don't know whether this error is relative to one of cluster nodes' linux > crash? > > Any advice will be appreciated! > > Dillon Peng -- Yuki Morishita t:yukim (http://twitter.com/yukim)
Re: why I got error "Could not retrieve endpoint rangs" when I run sstableloader?
hi, Yuki Thank you very much! The issue's description almost fits to my case! 1. My Cassandra version is 2.1.11 2. my table has several colomn with collection type 3. Before failed this time, I can use sstableloader to load the data into this table, but I got this error after I drop one column with collection type and insert a column with int type Do you think I will resolve my question if I update the version into 2.1.13? And, my table already had 560 millions of records. So, for resolving this, Whether I only need to update the new version C*.jar and restart cassandra? Dillon 2015-12-29 7:36 GMT+08:00 Yuki Morishita <mor.y...@gmail.com>: > This is known issue. > > https://issues.apache.org/jira/browse/CASSANDRA-10700 > > It is fixed in not-yet-released version 2.1.13. > So, you need to build from the latest cassandra-2.1 branch to try. > > > On Mon, Dec 28, 2015 at 5:28 PM, 土卜皿 <pengcz.n...@gmail.com> wrote: > > hi, all > >I used the sstableloader many times successfully, but I got the > following > > error: > > > > [root@localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user > -pw > > password -v -d 172.21.0.131 ./currentdata/keyspace/table > > > > Could not retrieve endpoint ranges: > > java.lang.IllegalArgumentException > > java.lang.RuntimeException: Could not retrieve endpoint ranges: > > at > > > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338) > > at > > > org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156) > > at > org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106) > > Caused by: java.lang.IllegalArgumentException > > at java.nio.Buffer.limit(Buffer.java:267) > > at > > > org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543) > > at > > > org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124) > > at > > > org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101) > > at > > > org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30) > > at > > > org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50) > > at > > > org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68) > > at > > > org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287) > > at > > > org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833) > > at > > > org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126) > > at > > > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330) > > ... 2 more > > > > I don't know whether this error is relative to one of cluster nodes' > linux > > crash? > > > > Any advice will be appreciated! > > > > Dillon Peng > > > > -- > Yuki Morishita > t:yukim (http://twitter.com/yukim) >
Re: Running sstableloader from every node when migrating?
Thank you Robert and Anuja, It does not seem that sstable2json is the right tool to go: there is no documentation beyond Cassandra 1.2, it requires a specific sstable to be given, which means a lot of manual work. The documentation also mentions it is good for testing/debugging but I would need to migrate near 1 TB of data from a 6-node cluster to a 3-node one. Neither copying sstables/nodetool refresh seems a great option as well. Unless I am missing something. Using sstableloader seems a more logical option. Still a bottleneck if you need to do it for every node in your source cluster. What if you had a 100-node cluster? Thinking of just running a simple script, instead, that selects data from the source cluster and inserts them to the target one. Kind regards, George On Tue, Dec 1, 2015 at 7:54 AM, anuja jain <anujaja...@gmail.com> wrote: > Hello George, > You can use sstable2json to create the json of your keyspace and then load > this json to your keyspace in new cluster using json2sstable utility. > > On Tue, Dec 1, 2015 at 3:06 AM, Robert Coli <rc...@eventbrite.com> wrote: > >> On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos <sigle...@textkernel.nl> >> wrote: >> >>> We would like to migrate one keyspace from a 6-node cluster to a 3-node >>> one. >>> >> >> http://www.pythian.com/blog/bulk-loading-options-for-cassandra/ >> >> =Rob >> >> > >
Re: Running sstableloader from every node when migrating?
On Thu, Nov 19, 2015 at 7:01 AM, George Sigletoswrote: > We would like to migrate one keyspace from a 6-node cluster to a 3-node > one. > http://www.pythian.com/blog/bulk-loading-options-for-cassandra/ =Rob
Re: Running sstableloader from every node when migrating?
Hello George, You can use sstable2json to create the json of your keyspace and then load this json to your keyspace in new cluster using json2sstable utility. On Tue, Dec 1, 2015 at 3:06 AM, Robert Coliwrote: > On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos > wrote: > >> We would like to migrate one keyspace from a 6-node cluster to a 3-node >> one. >> > > http://www.pythian.com/blog/bulk-loading-options-for-cassandra/ > > =Rob > >
Running sstableloader from every node when migrating?
Hello, We would like to migrate one keyspace from a 6-node cluster to a 3-node one. Since an individual node does not contain all data, this means that we should run the sstableloader 6 times, one for each node of our cluster. To be precise, do "nodetool flush " then run sstableloader -d <3 target nodes> Would that be the correct approach? Thank you in advance, George
Re: Data.db too large and after sstableloader still large
On Thu, Nov 12, 2015 at 6:44 AM, qihuang.zheng <qihuang.zh...@fraudmetrix.cn > wrote: > question is : why sstableloader can’t balance data file size? > Because it streams ranges from the source SStable to a distributed set of ranges, especially if you are using vnodes. It is a general property of Cassandra's streaming that it results in SStables that are likely different in size than those that result from flush. Why are you preoccupied with the filesizes of files sized in the hundreds of megabytes? Why do you care about this amount of variance in file sized? =Rob
Re: Data.db too large and after sstableloader still large
Tks,Rob. We use spark-cassandra-connector to read data from table, then do repartition action. If some nodes with large file bring out running this tasktoo slow, maybe serveral hours which is unacceptable. But those nodes with small file running finished quickly. So I think if sstableloader can split to small size, and can balance to all nodes, thus our spark job can running quickly. Tks,qihuang.zheng 原始邮件 发件人:Robert colirc...@eventbrite.com 收件人:user@cassandra.apache.orgu...@cassandra.apache.org 发送时间:2015年11月13日(周五) 04:04 主题:Re: Data.db too large and after sstableloader still large On Thu, Nov 12, 2015 at 6:44 AM, qihuang.zheng qihuang.zh...@fraudmetrix.cn wrote: question is : why sstableloader can’t balance data file size? Because it streams ranges from the source SStable to a distributed set of ranges, especially if you are using vnodes. It is a general property of Cassandra's streaming that it results in SStables that are likely different in size than those that result from flush. Why are you preoccupied with the filesizes of files sized in the hundreds of megabytes? Why do you care about this amount of variance in file sized? =Rob
Data.db too large and after sstableloader still large
We do snapshot, and found some Data.db too large: [qihuang.zheng@spark047219 5]$ find . -type f -size +800M -print0 | xargs -0 ls -lh -rw-r--r--. 2 qihuang.zheng users 1.5G 10月 28 14:49 ./forseti/velocity/forseti-velocity-jb-103631-Data.db And sstableloader to new cluster, one node has this large file: [qihuang.zheng@spark047243 velocity]$ ll -rth | grep Data -rw-r--r--. 1 admin admin 46M 11月 12 18:22 forseti-velocity-jb-21-Data.db -rw-r--r--. 1 admin admin 156M 11月 12 18:22 forseti-velocity-jb-22-Data.db -rw-r--r--. 1 admin admin 2.6M 11月 12 18:22 forseti-velocity-jb-23-Data.db -rw-r--r--. 1 admin admin 162M 11月 12 18:22 forseti-velocity-jb-24-Data.db -rw-r--r--. 1 admin admin 1.5G 11月 12 18:22 forseti-velocity-jb-25-Data.db -BigFile Still here Seems sstableloader don’t split file very well. Why sstableloader can’t split to small filter to new cluster? I tried usesstablesplit at snapshot before sstableloader, but this progress is too slow. Tks,qihuang.zheng
回复:Data.db too large and after sstableloader still large
Original snapshot files: [qihuang.zheng@spark047219 226_1105]$ ll 2/forseti/velocity/ -h | grep Data -rw-r--r--. 1 qihuang.zheng users 158M 10月 28 15:03 forseti-velocity-jb-102486-Data.db -rw-r--r--. 1 qihuang.zheng users 161M 10月 28 16:28 forseti-velocity-jb-103911-Data.db -rw-r--r--. 1 qihuang.zheng users 161M 10月 28 14:23 forseti-velocity-jb-103920-Data.db -rw-r--r--. 1 qihuang.zheng users 370M 10月 28 14:10 forseti-velocity-jb-105829-Data.db ⬅️ A Big File ① -rw-r--r--. 1 qihuang.zheng users 161M 10月 28 14:07 forseti-velocity-jb-107113-Data.db -rw-r--r--. 1 qihuang.zheng users 160M 10月 28 15:53 forseti-velocity-jb-73122-Data.db -rw-r--r--. 1 qihuang.zheng users 161M 10月 28 14:46 forseti-velocity-jb-85829-Data.db -rw-r--r--. 1 qihuang.zheng users 161M 10月 28 15:29 forseti-velocity-jb-87661-Data.db -rw-r--r--. 1 qihuang.zheng users 161M 10月 28 15:05 forseti-velocity-jb-93091-Data.db sstable to new cluster [qihuang.zheng@cass047202 ~]$ ./psshA.sh ip_spark.txt 'ls /home/admin/cassandra/data/forseti/velocity -hl |grep Data' Warning: do not enter your password if anyone else has superuser privileges or access to your account. Password: [1] 22:29:43 [SUCCESS] 192.168.47.208 -rw-r--r--. 1 admin admin 365K 11月 12 22:10 forseti-velocity-jb-20-Data.db -rw-r--r--. 1 admin admin 370M 11月 12 22:10 forseti-velocity-jb-21-Data.db ⬅️ File Still Large! and same size as ① -rw-r--r--. 1 admin admin 11M 11月 12 22:10 forseti-velocity-jb-22-Data.db [2] 22:29:43 [SUCCESS] 192.168.47.212 -rw-r--r--. 1 admin admin 146M 11月 12 22:09 forseti-velocity-jb-22-Data.db -rw-r--r--. 1 admin admin 3.7M 11月 12 22:09 forseti-velocity-jb-23-Data.db [3] 22:29:43 [SUCCESS] 192.168.47.215 -rw-r--r--. 1 admin admin 916K 11月 12 22:09 forseti-velocity-jb-14-Data.db [4] 22:29:43 [SUCCESS] 192.168.47.242 ⬅️ Almost Go To This Node! -rw-r--r--. 1 admin admin 106M 11月 12 22:10 forseti-velocity-jb-24-Data.db -rw-r--r--. 1 admin admin 160M 11月 12 22:10 forseti-velocity-jb-25-Data.db -rw-r--r--. 1 admin admin 158M 11月 12 22:10 forseti-velocity-jb-26-Data.db -rw-r--r--. 1 admin admin 160M 11月 12 22:10 forseti-velocity-jb-27-Data.db [5] 22:29:43 [FAILURE] 192.168.47.223 Exited with error code 1 ⬅️ This Node has None Files! [6] 22:29:43 [SUCCESS] 192.168.47.244 -rw-r--r--. 1 admin admin 111M 11月 12 22:09 forseti-velocity-jb-18-Data.db [7] 22:29:43 [SUCCESS] 192.168.47.245 -rw-r--r--. 1 admin admin 50M 11月 12 22:09 forseti-velocity-jb-22-Data.db -rw-r--r--. 1 admin admin 170K 11月 12 22:09 forseti-velocity-jb-23-Data.db [8] 22:29:43 [SUCCESS] 192.168.47.241 -rw-r--r--. 1 admin admin 7.5M 11月 12 22:09 forseti-velocity-jb-30-Data.db [9] 22:29:43 [FAILURE] 192.168.47.218 Exited with error code 1 ⬅️ No Files [10] 22:29:43 [SUCCESS] 192.168.47.243 -rw-r--r--. 1 admin admin 15M 11月 12 22:09 forseti-velocity-jb-29-Data.db [11] 22:29:43 [SUCCESS] 192.168.47.219 -rw-r--r--. 1 admin admin 160M 11月 12 22:09 forseti-velocity-jb-23-Data.db [12] 22:29:43 [SUCCESS] 192.168.47.217 -rw-r--r--. 1 admin admin 30M 11月 12 22:09 forseti-velocity-jb-22-Data.db [13] 22:29:44 [SUCCESS] 192.168.47.216 -rw-r--r--. 1 admin admin 3.5M 11月 12 22:09 forseti-velocity-jb-20-Data.db -rw-r--r--. 1 admin admin 161M 11月 12 22:09 forseti-velocity-jb-21-Data.db We use spark-case-connecot to read table and repartition. Spark repartition job below indicate: If nodes has none data.db like first two nodes, InputSize is 0.0B,and nodes with large files like the last one running too long! My question is : why sstableloader can’t balance data file size? Tks,qihuang.zheng 原始邮件 发件人:qihuang.zhengqihuang.zh...@fraudmetrix.cn 收件人:useru...@cassandra.apache.org 发送时间:2015年11月12日(周四) 21:20 主题:Data.db too large and after sstableloader still large We do snapshot, and found some Data.db too large: [qihuang.zheng@spark047219 5]$ find . -type f -size +800M -print0 | xargs -0 ls -lh -rw-r--r--. 2 qihuang.zheng users 1.5G 10月 28 14:49 ./forseti/velocity/forseti-velocity-jb-103631-Data.db And sstableloader to new cluster, one node has this large file: [qihuang.zheng@spark047243 velocity]$ ll -rth | grep Data -rw-r--r--. 1 admin admin 46M 11月 12 18:22 forseti-velocity-jb-21-Data.db -rw-r--r--. 1 admin admin 156M 11月 12 18:22 forseti-velocity-jb-22-Data.db -rw-r--r--. 1 admin admin 2.6M 11月 12 18:22 forseti-velocity-jb-23-Data.db -rw-r--r--. 1 admin admin 162M 11月 12 18:22 forseti-velocity-jb-24-Data.db -rw-r--r--. 1 admin admin 1.5G 11月 12 18:22 forseti-velocity-jb-25-Data.db -BigFile Still here Seems sstableloader don’t split file very well. Why sstableloader can’t split to small filter to new cluster? I tried usesstablesplit at snapshot before sstableloader, but this progress is too slow. Tks,qihuang.zheng