subject:"Sstableloader"

Re: SSTableloader questions

2020-11-12 Thread Erick Ramirez

>
> Can the sstableloader job run from outside a Cassandra node? or it has to
> be run from inside Cassandra node.
>

Yes, I'm a fan of running sstableloader on a server that is not one of the
nodes in the cluster. You can maximise the throughput by running multiple
instances of sstableloader loading SSTables from separate
sources/filesystems.

My suspicion is that the failed connection to the nodes is due to the SSL
options so check that you've specified the truststore/keystore correctly.
Cheers!

>

Re: SSTableloader questions

2020-11-12 Thread Jai Bheemsen Rao Dhanwada

Hello Erick,

I have one more question.

Can the sstableloader job run from outside a Cassandra node? or it has to
be run from inside Cassandra node.

When I tried it from the cassandra node it worked but when I try to run it
from outside the cassandra cluster(a standalone machine which doesn't have
any Cassandra process running) using the below command it fails with
streaming error.

*Command:*

> $ /root/apache-cassandra-3.11.6/bin/sstableloader -d ip1,ip2,ip3
> keyspace1/table1 --truststore truststore.p12 --truststore-password
> cassandra --keystore-password cassandra --keystore keystore.p12 -v -u user
> -pw password --ssl-storage-port 7001 -prtcl TLS


*Errors:*

> ERROR 21:48:22,078 [Stream #be7a0de0-2530-11eb-bc56-c7c5c59d560b]
> Streaming error occurred on session with peer 10.66.129.194
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_272]
> at sun.nio.ch.Net.connect(Net.java:482) ~[na:1.8.0_272]
> at sun.nio.ch.Net.connect(Net.java:474) ~[na:1.8.0_272]
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647)
> ~[na:1.8.0_272]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
> ~[na:1.8.0_272]
> at
> org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:283)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:270)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:269)
> [apache-cassandra-3.11.6.jar:3.11.6]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_272]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_272]
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.6.jar:3.11.6]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]
> progress: total: 100% 0.000KiB/s (avg: 0.000KiB/s)


On Mon, Nov 9, 2020 at 3:08 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thanks Erick, I will go through the posts and get back if I have any
> questions.
>
> On Mon, Nov 9, 2020 at 1:58 PM Erick Ramirez 
> wrote:
>
>> A few months ago, I was asked a similar question so I wrote instructions
>> for this. It depends on whether the clusters are identical or not. The
>> posts define what "identical" means.
>>
>> If the source and target cluster are identical in configuration, follow
>> the procedure here -- https://community.datastax.com/questions/4534/.
>>
>> If the source and target cluster have different configurations, follow
>> the procedure here -- https://community.datastax.com/questions/4477/.
>> Cheers!
>>
>

Re: SSTableloader questions

2020-11-09 Thread Jai Bheemsen Rao Dhanwada

Thanks Erick, I will go through the posts and get back if I have any
questions.

On Mon, Nov 9, 2020 at 1:58 PM Erick Ramirez 
wrote:

> A few months ago, I was asked a similar question so I wrote instructions
> for this. It depends on whether the clusters are identical or not. The
> posts define what "identical" means.
>
> If the source and target cluster are identical in configuration, follow
> the procedure here -- https://community.datastax.com/questions/4534/.
>
> If the source and target cluster have different configurations, follow the
> procedure here -- https://community.datastax.com/questions/4477/. Cheers!
>

Re: SSTableloader questions

2020-11-09 Thread Erick Ramirez

A few months ago, I was asked a similar question so I wrote instructions
for this. It depends on whether the clusters are identical or not. The
posts define what "identical" means.

If the source and target cluster are identical in configuration, follow the
procedure here -- https://community.datastax.com/questions/4534/.

If the source and target cluster have different configurations, follow the
procedure here -- https://community.datastax.com/questions/4477/. Cheers!

SSTableloader questions

2020-11-09 Thread Jai Bheemsen Rao Dhanwada

Hello,

I have few questions regarding restoring the data from snapshots using
sstableloader.

If i have a 6 node cassandra cluster with VNODEs(256) and I have taken
snapshot of all 6 nodes and if I have to restore to another cluster

1. Does the target cluster have to be of the same size?
2. If 1 is true, does SSTableloader have to use  each snapshot from the
source cluster and map to the target nodes?
source1 -> target1
source2 -> target2
source3 -> target3
source4 -> target4
source5 -> target5
source6 -> target6
3.  if 1 is false, do I need to run sstableloader for all the 6 snapshots
from source in 3 nodes in target?
4. Can I have a different schema(only keyspace name) between source and
target clusters?
eg: keyspace1 in source cluster but keyspace2 in target

Thanks in advance.

Re: sstableloader - warning vs. failure?

2020-02-07 Thread James A. Robinson

Ok, thanks very much the answer!

On Fri, Feb 7, 2020 at 9:00 PM Erick Ramirez  wrote:

> INFO  [pool-1-thread-4] 2020-02-08 01:35:37,946 NoSpamLogger.java:91 -
>> Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
>>
>
> The message gets logged when SSTables are being cached and the cache fills
> up faster than objects are evicted from it. Note that the message is logged
> at INFO level (instead of WARN or ERROR) because there is no detrimental
> effect but there will be a performance hit in the form of read latency.
> When space becomes available, it will just continue on to cache the next
> 64k chunk of the sstable.
>
> FWIW The default cache size (file_cache_size_in_mb in cassandra.yaml) is
> 512 MB (max memory of 536870912 in the log entry). Cheers!
>

Re: sstableloader - warning vs. failure?

2020-02-07 Thread Erick Ramirez

>
> INFO  [pool-1-thread-4] 2020-02-08 01:35:37,946 NoSpamLogger.java:91 -
> Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
>

The message gets logged when SSTables are being cached and the cache fills
up faster than objects are evicted from it. Note that the message is logged
at INFO level (instead of WARN or ERROR) because there is no detrimental
effect but there will be a performance hit in the form of read latency.
When space becomes available, it will just continue on to cache the next
64k chunk of the sstable.

FWIW The default cache size (file_cache_size_in_mb in cassandra.yaml) is
512 MB (max memory of 536870912 in the log entry). Cheers!

sstableloader - warning vs. failure?

2020-02-07 Thread James A. Robinson

Hi folks,

When sstableloader hits a very large sstable cassandra may end up logging a
message like this:

INFO  [pool-1-thread-4] 2020-02-08 01:35:37,946 NoSpamLogger.java:91 -
Maximum memory usage reached (536870912), cannot allocate chunk of 1048576

The loading process doesn't abort, and the sstableloader stdout logging appears
to end up reporting success, e.g., with a few 100% totals across the nodes
reported:

progress: [/10.0.1.116]0:11/11 100% [/10.0.1.248]0:11/11 100%
[/10.0.1.93]0:11/11
100% total: 100% 0.000KiB/s (avg: 36.156MiB/s)
progress: [/10.0.1.116]0:11/11 100% [/10.0.1.248]0:11/11 100%
[/10.0.1.93]0:11/11
100% total: 100% 0.000KiB/s (avg: 34.914MiB/s)
progress: [/10.0.1.116]0:11/11 100% [/10.0.1.248]0:11/11 100%
[/10.0.1.93]0:11/11
100% total: 100% 0.000KiB/s (avg: 33.794MiB/s)

Summary statistics:
   Connections per host: 1
   Total files transferred : 33
   Total bytes transferred : 116.027GiB
   Total duration  : 3515748 ms
   Average transfer rate   : 33.794MiB/s
   Peak transfer rate  : 53.130MiB/s

In these situations is sstableloader hitting the memory issue and then
retrying a few times until it succeeds?  Or is it silently dropping data on
the floor?  I'd assume the former, but thought it'd be good to ask you
folks to be sure...

Jim

Re: sstableloader: How much does it actually need?

2020-02-07 Thread Reid Pinchback

Just mulling this based on some code and log digging I was doing while trying 
to have Reaper stay on top of our cluster.

I think maybe the caveat here relates to eventual consistency.  C* doesn’t do 
state changes as distributed transactions.  The assumption here is that RF=3 is 
implying that at any given instant in real time, either the data is visible 
nowhere, or it is visible in 3 places.  That’s a conceptual simplification but 
not a real time invariant when you don’t have a transactional horizon to 
perfectly determine visibility of data.

When you have C* usage antipatterns like a client that is determined to read 
back data that it just wrote, as though there was a session context that 
somehow provided repeatable read guarantees, under the covers in the logs you 
can see C* fighting to do on-the-fly repairs to push through the requested 
level of consistency before responding to the query.  Which means, for some 
period of time, that achieving consistency was still work in flight.

I’ve also read about some boundary screw cases like drift in time resolution 
between servers creating the opportunity for stale data, and repairs I think 
would fix that. I haven’t tested the scenario though, so I’m not sure how real 
the situation is.

Bottom line though, minus repairs, I think having all the nodes is getting you 
all your chances to repair the problems.  And if the data is mutating as you 
are grabbing it, the entire frontier of changes is ‘minus repairs’.  Since 
tokens are distributed somewhat randomly, you don’t know where you need to make 
up the differences after.

That’s about as far as my navel gazing goes on that.

From: manish khandelwal 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, February 7, 2020 at 12:22 AM
To: "user@cassandra.apache.org" 
Subject: Re: sstableloader: How much does it actually need?

Message from External Sender
Yes you will have all the data in two nodes provided there is no mutation drop 
at node level or data is repaired

For example if you data A,B,C and D. with RF=3 and 4 nodes (node1, node2, node3 
and node4)

Data A is in node1, node2 and node3
Data B is in node2, node3, and node4
Data C is in node3, node4 and node1
Data D is in node4, node1 and node2

With this configuration, any two nodes combined will give all the data.


Regards
Manish

On Fri, Feb 7, 2020 at 12:53 AM Voytek Jarnot 
mailto:voytek.jar...@gmail.com>> wrote:
Been thinking about it, and I can't really see how with 4 nodes and RF=3, any 2 
nodes would *not* have all the data; but am more than willing to learn.

On the other thing: that's an attractive option, but in our case, the target 
cluster will likely come into use before the source-cluster data is available 
to load. Seemed to me the safest approach was sstableloader.

Thanks

On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez 
mailto:flightc...@gmail.com>> wrote:
Unfortunately, there isn't a guarantee that 2 nodes alone will have the full 
copy of data. I'd rather not say "it depends". 

TIP: If the nodes in the target cluster have identical tokens allocated, you 
can just do a straight copy of the sstables node-for-node then do nodetool 
refresh. If the target cluster is already built and you can't assign the same 
tokens then sstableloader is your only option. Cheers!

P.S. No need to apologise for asking questions. That's what we're all here for. 
Just keep them coming. 

Re: sstableloader: How much does it actually need?

2020-02-06 Thread manish khandelwal

Yes you will have all the data in two nodes provided there is no mutation
drop at node level or data is repaired

For example if you data A,B,C and D. with RF=3 and 4 nodes (node1, node2,
node3 and node4)

Data A is in node1, node2 and node3
Data B is in node2, node3, and node4
Data C is in node3, node4 and node1
Data D is in node4, node1 and node2

With this configuration, any *two nodes combined* will give all the data.


Regards
Manish

On Fri, Feb 7, 2020 at 12:53 AM Voytek Jarnot 
wrote:

> Been thinking about it, and I can't really see how with 4 nodes and RF=3,
> any 2 nodes would *not* have all the data; but am more than willing to
> learn.
>
> On the other thing: that's an attractive option, but in our case, the
> target cluster will likely come into use before the source-cluster data is
> available to load. Seemed to me the safest approach was sstableloader.
>
> Thanks
>
> On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez  wrote:
>
>> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
>> full copy of data. I'd rather not say "it depends". 
>>
>> TIP: If the nodes in the target cluster have identical tokens allocated,
>> you can just do a straight copy of the sstables node-for-node then do 
>> nodetool
>> refresh. If the target cluster is already built and you can't assign the
>> same tokens then sstableloader is your only option. Cheers!
>>
>> P.S. No need to apologise for asking questions. That's what we're all
>> here for. Just keep them coming. 
>>
>>>

Re: sstableloader: How much does it actually need?

2020-02-06 Thread Voytek Jarnot

Been thinking about it, and I can't really see how with 4 nodes and RF=3,
any 2 nodes would *not* have all the data; but am more than willing to
learn.

On the other thing: that's an attractive option, but in our case, the
target cluster will likely come into use before the source-cluster data is
available to load. Seemed to me the safest approach was sstableloader.

Thanks

On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez  wrote:

> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
> full copy of data. I'd rather not say "it depends". 
>
> TIP: If the nodes in the target cluster have identical tokens allocated,
> you can just do a straight copy of the sstables node-for-node then do nodetool
> refresh. If the target cluster is already built and you can't assign the
> same tokens then sstableloader is your only option. Cheers!
>
> P.S. No need to apologise for asking questions. That's what we're all here
> for. Just keep them coming. 
>
>>

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Erick Ramirez

>
> Another option is the DSE-bulk loader but it will require to convert to
> csv/json (good option if you don't like to play with sstableloader and deal
> to get all the sstables from all the nodes)
> https://docs.datastax.com/en/dsbulk/doc/index.html
>

Thanks, Sergio. The DataStax Bulk Loader was developed for a completely
different use case. It doesn't really make sense to go through trouble of
converting the SSTables to CSV/JSON when you've already got the SSTables to
begin with. ☺

It was really designed for loading/unloading data from non-C* sources as a
replacement for the COPY command. Cheers!

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Dor Laor

Another option is to use the Spark migrator, it reads a source CQL cluster and
writes to another. It has a validation stage that compares a full scan
and reports the diff:
https://github.com/scylladb/scylla-migrator

There are many more ways to clone a cluster. My main recommendation is
to 'optimize'
for correctness and simplicity first and only last optimize for
performance/time. Eventually
machine time for such rare operation is cheap, engineering time is
expensive and data
inconsistency is priceless..

On Wed, Feb 5, 2020 at 5:24 PM Sergio  wrote:
>
> Another option is the DSE-bulk loader but it will require to convert to 
> csv/json (good option if you don't like to play with sstableloader and deal 
> to get all the sstables from all the nodes)
> https://docs.datastax.com/en/dsbulk/doc/index.html
>
> Cheers
>
> Sergio
>
> Il giorno mer 5 feb 2020 alle ore 16:56 Erick Ramirez  
> ha scritto:
>>
>> Unfortunately, there isn't a guarantee that 2 nodes alone will have the full 
>> copy of data. I'd rather not say "it depends".
>>
>> TIP: If the nodes in the target cluster have identical tokens allocated, you 
>> can just do a straight copy of the sstables node-for-node then do nodetool 
>> refresh. If the target cluster is already built and you can't assign the 
>> same tokens then sstableloader is your only option. Cheers!
>>
>> P.S. No need to apologise for asking questions. That's what we're all here 
>> for. Just keep them coming.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Sergio

Another option is the DSE-bulk loader but it will require to convert to
csv/json (good option if you don't like to play with sstableloader and deal
to get all the sstables from all the nodes)
https://docs.datastax.com/en/dsbulk/doc/index.html

Cheers

Sergio

Il giorno mer 5 feb 2020 alle ore 16:56 Erick Ramirez 
ha scritto:

> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
> full copy of data. I'd rather not say "it depends". 
>
> TIP: If the nodes in the target cluster have identical tokens allocated,
> you can just do a straight copy of the sstables node-for-node then do nodetool
> refresh. If the target cluster is already built and you can't assign the
> same tokens then sstableloader is your only option. Cheers!
>
> P.S. No need to apologise for asking questions. That's what we're all here
> for. Just keep them coming. 
>
>>

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Erick Ramirez

Unfortunately, there isn't a guarantee that 2 nodes alone will have the
full copy of data. I'd rather not say "it depends". 

TIP: If the nodes in the target cluster have identical tokens allocated,
you can just do a straight copy of the sstables node-for-node then do nodetool
refresh. If the target cluster is already built and you can't assign the
same tokens then sstableloader is your only option. Cheers!

P.S. No need to apologise for asking questions. That's what we're all here
for. Just keep them coming. 

>

sstableloader: How much does it actually need?

2020-02-05 Thread Voytek Jarnot

Scenario: Cassandra 3.11.x, 4 nodes, RF=3; moving to identically-sized
cluster via snapshots and sstableloader.

As far as I can tell, in the topology given above, any 2 nodes contain all
of the data. In terms of migrating this cluster, would there be any
downsides or risks with snapshotting and loading (sstableloader) only 2 of
the nodes rather than all 4?

Apologies for the spate of hypotheticals lately, this project is making
life interesting.

Thanks,
Voytek Jarnot

Re: [EXTERNAL] Re: sstableloader & num_tokens change

2020-01-27 Thread Voytek Jarnot

Odd. Have you seen this behavior? I ran a test last week, loaded snapshots
from 4 nodes to 4 nodes (RF 3 on both ends) and did not notice a spike.
That's not to say that it didn't happen, but I think I'd have noticed as I
was loading approx 250GB x 4 (although sequentially rather than 4x
sstableloader in parallel).

Also, thanks to everyone for confirming no issue with num_tokens and
sstableloader; appreciate it.


On Mon, Jan 27, 2020 at 9:02 AM Durity, Sean R 
wrote:

> I would suggest to be aware of potential data size expansion. If you load
> (for example) three copies of the data into a new cluster (because the RF
> of the origin cluster is 3), it will also get written to the RF of the new
> cluster (3 more times). So, you could see data expansion of 9x the original
> data size (or, origin RF * target RF), until compaction can run.
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Erick Ramirez 
> *Sent:* Friday, January 24, 2020 11:03 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: sstableloader & num_tokens change
>
>
>
>
>
> If I may just loop this back to the question at hand:
>
> I'm curious if there are any gotchas with using sstableloader to restore
> snapshots taken from 256-token nodes into a cluster with 32-token (or your
> preferred number of tokens) nodes (otherwise same # of nodes and same RF).
>
>
>
> No, there isn't. It will work as designed so you're good to go. Cheers!
>
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] Re: sstableloader & num_tokens change

2020-01-27 Thread Durity, Sean R

I would suggest to be aware of potential data size expansion. If you load (for 
example) three copies of the data into a new cluster (because the RF of the 
origin cluster is 3), it will also get written to the RF of the new cluster (3 
more times). So, you could see data expansion of 9x the original data size (or, 
origin RF * target RF), until compaction can run.


Sean Durity – Staff Systems Engineer, Cassandra

From: Erick Ramirez 
Sent: Friday, January 24, 2020 11:03 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: sstableloader & num_tokens change


If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore 
snapshots taken from 256-token nodes into a cluster with 32-token (or your 
preferred number of tokens) nodes (otherwise same # of nodes and same RF).

No, there isn't. It will work as designed so you're good to go. Cheers!





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: sstableloader & num_tokens change

2020-01-27 Thread Jean Carlo

Hello

Concerning the original question, I agreed with @eric_ramirez,
sstableloader is transparent for token allocation number.

just for info @voytek, check this post out
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
You lay be interested to now if you have your cluster well balanced with 32
tokens. 32 tokens seems to be the future default value, but changing the
default vnodes token numbers seems not to be so straight forward

cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Sat, Jan 25, 2020 at 5:05 AM Erick Ramirez  wrote:

> On the subject of DSBulk, sstableloader is the tool of choice for this
> scenario.
>
> +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader
> for CSV/JSON formats. Cheers!
>

Re: sstableloader & num_tokens change

2020-01-24 Thread Erick Ramirez

On the subject of DSBulk, sstableloader is the tool of choice for this
scenario.

+1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader
for CSV/JSON formats. Cheers!

Re: sstableloader & num_tokens change

2020-01-24 Thread Erick Ramirez

> If I may just loop this back to the question at hand:
>
> I'm curious if there are any gotchas with using sstableloader to restore
> snapshots taken from 256-token nodes into a cluster with 32-token (or your
> preferred number of tokens) nodes (otherwise same # of nodes and same RF).
>

No, there isn't. It will work as designed so you're good to go. Cheers!


>

Re: sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot

If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore
snapshots taken from 256-token nodes into a cluster with 32-token (or your
preferred number of tokens) nodes (otherwise same # of nodes and same RF).

On Fri, Jan 24, 2020 at 11:15 AM Sergio  wrote:

> https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html
>
> Just skimming through the docs
>
> I see examples by loading from CSV / JSON
>
> Maybe there is some other command or doc page that I am missing
>
>
>
>
> On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth  wrote:
>
>> Dsbulk works same as sstableloder.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Jan 24, 2020, at 10:40 AM, Sergio  wrote:
>>
>> 
>> I was wondering if that improvement for token allocation would work even
>> with just one rack. It should but I am not sure.
>>
>> Does Dsbulk support migration cluster to cluster without CSV or JSON
>> export?
>>
>> Thanks and Regards
>>
>> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:
>>
>>> Instead of sstableloader consider dsbulk by datastax.
>>>
>>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
>>> rpinchb...@tripadvisor.com> wrote:
>>>
>>>> Jon Haddad has previously made the case for num_tokens=4.  His
>>>> Accelerate 2019 talk is available at:
>>>>
>>>>
>>>>
>>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>>>
>>>>
>>>>
>>>> You might want to check that out.  Also I think the amount of effort
>>>> you put into evening out the token distribution increases as vnode count
>>>> shrinks.  The caveats are explored at:
>>>>
>>>>
>>>>
>>>>
>>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Voytek Jarnot 
>>>> *Reply-To: *"user@cassandra.apache.org" 
>>>> *Date: *Friday, January 24, 2020 at 10:39 AM
>>>> *To: *"user@cassandra.apache.org" 
>>>> *Subject: *sstableloader & num_tokens change
>>>>
>>>>
>>>>
>>>> *Message from External Sender*
>>>>
>>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different
>>>> 4 node RF=3 cluster.
>>>>
>>>>
>>>>
>>>> I've read that 256 is not an optimal default num_tokens value, and that
>>>> 32 is likely a better option.
>>>>
>>>>
>>>>
>>>> We have the "opportunity" to switch, as we're migrating environments
>>>> and will likely be using sstableloader to do so. I'm curious if there are
>>>> any gotchas with using sstableloader to restore snapshots taken from
>>>> 256-token nodes into a cluster with 32-token nodes (otherwise same # of
>>>> nodes and same RF).
>>>>
>>>>
>>>>
>>>> Thanks in advance.
>>>>
>>>

Re: sstableloader & num_tokens change

2020-01-24 Thread Sergio

https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html

Just skimming through the docs

I see examples by loading from CSV / JSON

Maybe there is some other command or doc page that I am missing




On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth  wrote:

> Dsbulk works same as sstableloder.
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Jan 24, 2020, at 10:40 AM, Sergio  wrote:
>
> 
> I was wondering if that improvement for token allocation would work even
> with just one rack. It should but I am not sure.
>
> Does Dsbulk support migration cluster to cluster without CSV or JSON
> export?
>
> Thanks and Regards
>
> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:
>
>> Instead of sstableloader consider dsbulk by datastax.
>>
>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
>> rpinchb...@tripadvisor.com> wrote:
>>
>>> Jon Haddad has previously made the case for num_tokens=4.  His
>>> Accelerate 2019 talk is available at:
>>>
>>>
>>>
>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>>
>>>
>>>
>>> You might want to check that out.  Also I think the amount of effort you
>>> put into evening out the token distribution increases as vnode count
>>> shrinks.  The caveats are explored at:
>>>
>>>
>>>
>>>
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>>
>>>
>>>
>>>
>>>
>>> *From: *Voytek Jarnot 
>>> *Reply-To: *"user@cassandra.apache.org" 
>>> *Date: *Friday, January 24, 2020 at 10:39 AM
>>> *To: *"user@cassandra.apache.org" 
>>> *Subject: *sstableloader & num_tokens change
>>>
>>>
>>>
>>> *Message from External Sender*
>>>
>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different
>>> 4 node RF=3 cluster.
>>>
>>>
>>>
>>> I've read that 256 is not an optimal default num_tokens value, and that
>>> 32 is likely a better option.
>>>
>>>
>>>
>>> We have the "opportunity" to switch, as we're migrating environments and
>>> will likely be using sstableloader to do so. I'm curious if there are any
>>> gotchas with using sstableloader to restore snapshots taken from 256-token
>>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>>> same RF).
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>

Re: sstableloader & num_tokens change

2020-01-24 Thread Nitan Kainth

Dsbulk works same as sstableloder.


Regards,
Nitan
Cell: 510 449 9629

> On Jan 24, 2020, at 10:40 AM, Sergio  wrote:
> 
> 
> I was wondering if that improvement for token allocation would work even with 
> just one rack. It should but I am not sure.
> 
> Does Dsbulk support migration cluster to cluster without CSV or JSON export?
> 
> Thanks and Regards
> 
>> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:
>> Instead of sstableloader consider dsbulk by datastax. 
>> 
>>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback 
>>>  wrote:
>>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate 
>>> 2019 talk is available at:
>>> 
>>>  
>>> 
>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>> 
>>>  
>>> 
>>> You might want to check that out.  Also I think the amount of effort you 
>>> put into evening out the token distribution increases as vnode count 
>>> shrinks.  The caveats are explored at:
>>> 
>>>  
>>> 
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>> 
>>>  
>>> 
>>>  
>>> 
>>> From: Voytek Jarnot 
>>> Reply-To: "user@cassandra.apache.org" 
>>> Date: Friday, January 24, 2020 at 10:39 AM
>>> To: "user@cassandra.apache.org" 
>>> Subject: sstableloader & num_tokens change
>>> 
>>>  
>>> 
>>> Message from External Sender
>>> 
>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 
>>> node RF=3 cluster.
>>> 
>>>  
>>> 
>>> I've read that 256 is not an optimal default num_tokens value, and that 32 
>>> is likely a better option.
>>> 
>>>  
>>> 
>>> We have the "opportunity" to switch, as we're migrating environments and 
>>> will likely be using sstableloader to do so. I'm curious if there are any 
>>> gotchas with using sstableloader to restore snapshots taken from 256-token 
>>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and 
>>> same RF).
>>> 
>>>  
>>> 
>>> Thanks in advance.

Re: sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot

Why? Seems to me that the old Cassandra -> CSV/JSON and CSV/JSON -> new
Cassandra are unnecessary steps in my case.

On Fri, Jan 24, 2020 at 10:34 AM Nitan Kainth  wrote:

> Instead of sstableloader consider dsbulk by datastax.
>
> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
> rpinchb...@tripadvisor.com> wrote:
>
>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
>> 2019 talk is available at:
>>
>>
>>
>> https://www.youtube.com/watch?v=swL7bCnolkU
>>
>>
>>
>> You might want to check that out.  Also I think the amount of effort you
>> put into evening out the token distribution increases as vnode count
>> shrinks.  The caveats are explored at:
>>
>>
>>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>>
>>
>>
>>
>> *From: *Voytek Jarnot 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Friday, January 24, 2020 at 10:39 AM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *sstableloader & num_tokens change
>>
>>
>>
>> *Message from External Sender*
>>
>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
>> node RF=3 cluster.
>>
>>
>>
>> I've read that 256 is not an optimal default num_tokens value, and that
>> 32 is likely a better option.
>>
>>
>>
>> We have the "opportunity" to switch, as we're migrating environments and
>> will likely be using sstableloader to do so. I'm curious if there are any
>> gotchas with using sstableloader to restore snapshots taken from 256-token
>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>> same RF).
>>
>>
>>
>> Thanks in advance.
>>
>

Re: sstableloader & num_tokens change

2020-01-24 Thread Sergio

I was wondering if that improvement for token allocation would work even
with just one rack. It should but I am not sure.

Does Dsbulk support migration cluster to cluster without CSV or JSON export?

Thanks and Regards

On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:

> Instead of sstableloader consider dsbulk by datastax.
>
> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
> rpinchb...@tripadvisor.com> wrote:
>
>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
>> 2019 talk is available at:
>>
>>
>>
>> https://www.youtube.com/watch?v=swL7bCnolkU
>>
>>
>>
>> You might want to check that out.  Also I think the amount of effort you
>> put into evening out the token distribution increases as vnode count
>> shrinks.  The caveats are explored at:
>>
>>
>>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>>
>>
>>
>>
>> *From: *Voytek Jarnot 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Friday, January 24, 2020 at 10:39 AM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *sstableloader & num_tokens change
>>
>>
>>
>> *Message from External Sender*
>>
>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
>> node RF=3 cluster.
>>
>>
>>
>> I've read that 256 is not an optimal default num_tokens value, and that
>> 32 is likely a better option.
>>
>>
>>
>> We have the "opportunity" to switch, as we're migrating environments and
>> will likely be using sstableloader to do so. I'm curious if there are any
>> gotchas with using sstableloader to restore snapshots taken from 256-token
>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>> same RF).
>>
>>
>>
>> Thanks in advance.
>>
>

Re: sstableloader & num_tokens change

2020-01-24 Thread Nitan Kainth

Instead of sstableloader consider dsbulk by datastax.

On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback 
wrote:

> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
> 2019 talk is available at:
>
>
>
> https://www.youtube.com/watch?v=swL7bCnolkU
>
>
>
> You might want to check that out.  Also I think the amount of effort you
> put into evening out the token distribution increases as vnode count
> shrinks.  The caveats are explored at:
>
>
>
>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
>
>
>
>
> *From: *Voytek Jarnot 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Friday, January 24, 2020 at 10:39 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *sstableloader & num_tokens change
>
>
>
> *Message from External Sender*
>
> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
> node RF=3 cluster.
>
>
>
> I've read that 256 is not an optimal default num_tokens value, and that 32
> is likely a better option.
>
>
>
> We have the "opportunity" to switch, as we're migrating environments and
> will likely be using sstableloader to do so. I'm curious if there are any
> gotchas with using sstableloader to restore snapshots taken from 256-token
> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
> same RF).
>
>
>
> Thanks in advance.
>

Re: sstableloader & num_tokens change

2020-01-24 Thread Reid Pinchback

Jon Haddad has previously made the case for num_tokens=4.  His Accelerate 2019 
talk is available at:

https://www.youtube.com/watch?v=swL7bCnolkU

You might want to check that out.  Also I think the amount of effort you put 
into evening out the token distribution increases as vnode count shrinks.  The 
caveats are explored at:

https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html


From: Voytek Jarnot 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, January 24, 2020 at 10:39 AM
To: "user@cassandra.apache.org" 
Subject: sstableloader & num_tokens change

Message from External Sender
Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node 
RF=3 cluster.

I've read that 256 is not an optimal default num_tokens value, and that 32 is 
likely a better option.

We have the "opportunity" to switch, as we're migrating environments and will 
likely be using sstableloader to do so. I'm curious if there are any gotchas 
with using sstableloader to restore snapshots taken from 256-token nodes into a 
cluster with 32-token nodes (otherwise same # of nodes and same RF).

Thanks in advance.

sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot

Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
node RF=3 cluster.

I've read that 256 is not an optimal default num_tokens value, and that 32
is likely a better option.

We have the "opportunity" to switch, as we're migrating environments and
will likely be using sstableloader to do so. I'm curious if there are any
gotchas with using sstableloader to restore snapshots taken from 256-token
nodes into a cluster with 32-token nodes (otherwise same # of nodes and
same RF).

Thanks in advance.

Re: [EXTERNAL] Re: Sstableloader

2019-05-30 Thread Goetz, Anthony

It appears you have two goals you are trying to accomplish at the same time.  
My recommendation is to break it into two different steps.  You need to decide 
if you are going to upgrade DSE or OSS.


  *   Upgrade DSE then migrate to OSS
 *   Upgrade DSE to version that matches OSS 3.11.3 binary
 *   Perform datacenter switch
  *   Migrate to OSS then upgrade
 *   Migrate to OSS using version that matches DSE Cassandra binary (DSE 
5.0.7 = 3.0.11)
 *   Upgrade OSS to 3.11.3 binary

From: Rahul Reddy 
Date: Thursday, May 30, 2019 at 6:37 AM
To: Cassandra User List 
Cc: Anthony Goetz 
Subject: [EXTERNAL] Re: Sstableloader

Thank you Anthony and Jonathan. To add new ring it doesn't have to be same 
version of Cassandra right. For ex dse 5.12 which is 3.11.0 has stables with mc 
name and apache 3.11.3 also uses sstables name with mc . We should be still 
able to add it to the ring correct

On Wed, May 29, 2019, 9:55 PM Goetz, Anthony 
mailto:anthony_goe...@comcast.com>> wrote:
My team migrated from DSE to OSS a few years ago by doing datacenter switch.  
You will need to update replication strategy for all keyspaces that are using 
Everywhere to NetworkTopologyStrategy before adding any OSS nodes.  As Jonathan 
mentioned, DSE nodes will revert this change on restart.  To account for this, 
we modified our init script to call a cql script that would make sure the 
keyspaces were set back to NetworkTopologyStrategy.

High Level Plan:

  *   Find DSE Cassandra binary version
  *   Review config to make sure you are not using any DSE specific settings
  *   Update replication strategy on keyspaces using Everywhere to 
NetworkTopologyStrategy
  *   Add OSS DC using same binary version as DSE
  *   Migrate clients to new OSS DC
  *   Decommission DSE DC

Note:  OpsCenter will stop working once you add OSS nodes.

From: Jonathan Koppenhofer mailto:j...@koppedomain.com>>
Reply-To: Cassandra User List 
mailto:user@cassandra.apache.org>>
Date: Wednesday, May 29, 2019 at 6:45 PM
To: Cassandra User List 
mailto:user@cassandra.apache.org>>
Subject: [EXTERNAL] Re: Sstableloader

Has anyone tried to do a DC switch as a means to migrate from Datastax to OSS? 
This would be the safest route as the ability to revert back to Datastax is 
easy. However, I'm curious how the dse_system keyspace would be replicated to 
OSS using their custom Everywhere strategy. You may have to change the to 
Network topology strategy before firing up OSS nodes. Also, keep in mind if you 
restart any DSE nodes, it will revert that keyspace back to EverywhereStrategy.

I also posted a means to migrate in place on this mailing list a few months 
back (thanks for help from others on the mailing list), but it is a little more 
involved and risky. Let me know if you can't find it, and I'll dig it up.

Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS 3.0 
then up to 3.11.
On Wed, May 29, 2019, 5:56 PM Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If cassandra version is same, it should work

Regards,
Nitan
Cell: 510 449 9629

On May 28, 2019, at 4:21 PM, Rahul Reddy 
mailto:rahulreddy1...@gmail.com>> wrote:
Hello,

Does sstableloader works between datastax and Apache cassandra. I'm trying to 
migrate dse 5.0.7 to Apache 3.11.1 ?

Re: Sstableloader

2019-05-30 Thread Rahul Reddy

Thank you Anthony and Jonathan. To add new ring it doesn't have to be same
version of Cassandra right. For ex dse 5.12 which is 3.11.0 has stables
with mc name and apache 3.11.3 also uses sstables name with mc . We should
be still able to add it to the ring correct

On Wed, May 29, 2019, 9:55 PM Goetz, Anthony 
wrote:

> My team migrated from DSE to OSS a few years ago by doing datacenter
> switch.  You will need to update replication strategy for all keyspaces
> that are using Everywhere to NetworkTopologyStrategy before adding any OSS
> nodes.  As Jonathan mentioned, DSE nodes will revert this change on
> restart.  To account for this, we modified our init script to call a cql
> script that would make sure the keyspaces were set back to
> NetworkTopologyStrategy.
>
>
>
> High Level Plan:
>
>- Find DSE Cassandra binary version
>- Review config to make sure you are not using any DSE specific
>settings
>- Update replication strategy on keyspaces using Everywhere to
>NetworkTopologyStrategy
>- Add OSS DC using same binary version as DSE
>- Migrate clients to new OSS DC
>- Decommission DSE DC
>
>
>
> Note:  OpsCenter will stop working once you add OSS nodes.
>
>
>
> *From: *Jonathan Koppenhofer 
> *Reply-To: *Cassandra User List 
> *Date: *Wednesday, May 29, 2019 at 6:45 PM
> *To: *Cassandra User List 
> *Subject: *[EXTERNAL] Re: Sstableloader
>
>
>
> Has anyone tried to do a DC switch as a means to migrate from Datastax to
> OSS? This would be the safest route as the ability to revert back to
> Datastax is easy. However, I'm curious how the dse_system keyspace would be
> replicated to OSS using their custom Everywhere strategy. You may have to
> change the to Network topology strategy before firing up OSS nodes. Also,
> keep in mind if you restart any DSE nodes, it will revert that keyspace
> back to EverywhereStrategy.
>
>
>
> I also posted a means to migrate in place on this mailing list a few
> months back (thanks for help from others on the mailing list), but it is a
> little more involved and risky. Let me know if you can't find it, and I'll
> dig it up.
>
>
>
> Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS
> 3.0 then up to 3.11.
>
> On Wed, May 29, 2019, 5:56 PM Nitan Kainth  wrote:
>
> If cassandra version is same, it should work
>
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
>
> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
>
> Hello,
>
>
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>
>

Re: Sstableloader

2019-05-29 Thread Patrick Lee

Over the past year we've migrated several clusters from DSE to Apache
Cassandra. We've mostly done I place conversions node by node with no
downtime.  DSE 4.8.X to Apache Cassandra 2.1.x

On Wed, May 29, 2019 at 8:55 PM Goetz, Anthony 
wrote:

> My team migrated from DSE to OSS a few years ago by doing datacenter
> switch.  You will need to update replication strategy for all keyspaces
> that are using Everywhere to NetworkTopologyStrategy before adding any OSS
> nodes.  As Jonathan mentioned, DSE nodes will revert this change on
> restart.  To account for this, we modified our init script to call a cql
> script that would make sure the keyspaces were set back to
> NetworkTopologyStrategy.
>
>
>
> High Level Plan:
>
>- Find DSE Cassandra binary version
>- Review config to make sure you are not using any DSE specific
>settings
>- Update replication strategy on keyspaces using Everywhere to
>NetworkTopologyStrategy
>- Add OSS DC using same binary version as DSE
>- Migrate clients to new OSS DC
>- Decommission DSE DC
>
>
>
> Note:  OpsCenter will stop working once you add OSS nodes.
>
>
>
> *From: *Jonathan Koppenhofer 
> *Reply-To: *Cassandra User List 
> *Date: *Wednesday, May 29, 2019 at 6:45 PM
> *To: *Cassandra User List 
> *Subject: *[EXTERNAL] Re: Sstableloader
>
>
>
> Has anyone tried to do a DC switch as a means to migrate from Datastax to
> OSS? This would be the safest route as the ability to revert back to
> Datastax is easy. However, I'm curious how the dse_system keyspace would be
> replicated to OSS using their custom Everywhere strategy. You may have to
> change the to Network topology strategy before firing up OSS nodes. Also,
> keep in mind if you restart any DSE nodes, it will revert that keyspace
> back to EverywhereStrategy.
>
>
>
> I also posted a means to migrate in place on this mailing list a few
> months back (thanks for help from others on the mailing list), but it is a
> little more involved and risky. Let me know if you can't find it, and I'll
> dig it up.
>
>
>
> Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS
> 3.0 then up to 3.11.
>
> On Wed, May 29, 2019, 5:56 PM Nitan Kainth  wrote:
>
> If cassandra version is same, it should work
>
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
>
> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
>
> Hello,
>
>
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>
>

Re: Sstableloader

2019-05-29 Thread Goetz, Anthony

My team migrated from DSE to OSS a few years ago by doing datacenter switch.  
You will need to update replication strategy for all keyspaces that are using 
Everywhere to NetworkTopologyStrategy before adding any OSS nodes.  As Jonathan 
mentioned, DSE nodes will revert this change on restart.  To account for this, 
we modified our init script to call a cql script that would make sure the 
keyspaces were set back to NetworkTopologyStrategy.

High Level Plan:

  *   Find DSE Cassandra binary version
  *   Review config to make sure you are not using any DSE specific settings
  *   Update replication strategy on keyspaces using Everywhere to 
NetworkTopologyStrategy
  *   Add OSS DC using same binary version as DSE
  *   Migrate clients to new OSS DC
  *   Decommission DSE DC

Note:  OpsCenter will stop working once you add OSS nodes.

From: Jonathan Koppenhofer 
Reply-To: Cassandra User List 
Date: Wednesday, May 29, 2019 at 6:45 PM
To: Cassandra User List 
Subject: [EXTERNAL] Re: Sstableloader

Has anyone tried to do a DC switch as a means to migrate from Datastax to OSS? 
This would be the safest route as the ability to revert back to Datastax is 
easy. However, I'm curious how the dse_system keyspace would be replicated to 
OSS using their custom Everywhere strategy. You may have to change the to 
Network topology strategy before firing up OSS nodes. Also, keep in mind if you 
restart any DSE nodes, it will revert that keyspace back to EverywhereStrategy.

I also posted a means to migrate in place on this mailing list a few months 
back (thanks for help from others on the mailing list), but it is a little more 
involved and risky. Let me know if you can't find it, and I'll dig it up.

Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS 3.0 
then up to 3.11.
On Wed, May 29, 2019, 5:56 PM Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If cassandra version is same, it should work

Regards,
Nitan
Cell: 510 449 9629

On May 28, 2019, at 4:21 PM, Rahul Reddy 
mailto:rahulreddy1...@gmail.com>> wrote:
Hello,

Does sstableloader works between datastax and Apache cassandra. I'm trying to 
migrate dse 5.0.7 to Apache 3.11.1 ?

Re: Sstableloader

2019-05-29 Thread Jonathan Koppenhofer

Has anyone tried to do a DC switch as a means to migrate from Datastax to
OSS? This would be the safest route as the ability to revert back to
Datastax is easy. However, I'm curious how the dse_system keyspace would be
replicated to OSS using their custom Everywhere strategy. You may have to
change the to Network topology strategy before firing up OSS nodes. Also,
keep in mind if you restart any DSE nodes, it will revert that keyspace
back to EverywhereStrategy.

I also posted a means to migrate in place on this mailing list a few months
back (thanks for help from others on the mailing list), but it is a little
more involved and risky. Let me know if you can't find it, and I'll dig it
up.

Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS
3.0 then up to 3.11.

On Wed, May 29, 2019, 5:56 PM Nitan Kainth  wrote:

> If cassandra version is same, it should work
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
>
> Hello,
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>
>

Re: Sstableloader

2019-05-29 Thread Nitan Kainth

If cassandra version is same, it should work


Regards,
Nitan
Cell: 510 449 9629

> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
> 
> Hello,
> 
> Does sstableloader works between datastax and Apache cassandra. I'm trying to 
> migrate dse 5.0.7 to Apache 3.11.1 ?

Re: Sstableloader

2019-05-29 Thread Alain RODRIGUEZ

Hello,

I can't answer this question about the sstableloader (even though I think
it should be ok). My understanding, even though I'm not really up to date
with latest Datastax work, is that DSE uses a modified but compatible
version of Cassandra, for everything that is not 'DSE feature'
specifically. Especially I expect SSTable format to be the same.
SSTable loader has always been slow and inefficient for me though I did not
use it much.

I think the way out DSE should be documented somewhere in Datastax docs, if
not I think you can ask Datastax directly (or maybe someone here can help
you).

My guess is that the safest way out, without any downtime is probably to
perform a datacenter 'switch':
- Identify the Apache Cassandra version used under the hood by DSE (5.0.7).
Let's say it's 3.11.1 (I don't know)
- Add a new Apache Cassandra datacenter to your DSE cluster using this
version (I would rather use 3.11.latest in this case though... 3.11.1 had
memory leaks and other wild issues).
- Move client to this new DC
- Shutdown the old DC.

I wrote a runbook to perform such an operation not that long ago, you can
find it here:
https://thelastpickle.com/blog/2019/02/26/data-center-switch.html

I don't know for sure that this is the best way to go out of DSE, but that
would be my guess and the first thing I would investigate (before
SSTableLoader, clearly).

Hope that helps, even though it does not directly answers the question
(that I'm unable to answer) about SSTable & SSTableLoader compatibility
with DSE clusters.

C*heers

Le mar. 28 mai 2019 à 22:22, Rahul Reddy  a
écrit :

> Hello,
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>

Sstableloader

2019-05-28 Thread Rahul Reddy

Hello,

Does sstableloader works between datastax and Apache cassandra. I'm trying
to migrate dse 5.0.7 to Apache 3.11.1 ?

re: Trouble restoring with sstableloader

2019-04-18 Thread Carl Mueller

This is a response to a message from 2017 that I found unanswered on the
user list, we were getting the same error.

Also in this stackoverflow

https://stackoverflow.com/questions/53160611/frame-size-352518912-larger-than-max-length-15728640-exception-while-runnin/55751104#55751104

I have noted what we had to do to get things working. In this case it
appears the -tf and/or various keystore/truststore params weren't supplied.
In our case we weren't doing the -tf parameter.

... then we ran into the PKIX error.

Original message:
---

Hi all,

I've been running into the following issue while trying to restore a C*
database via sstableloader:

Could not retrieve endpoint ranges:
org.apache.thrift.transport.TTransportException: Frame size (352518912)
larger than max length (15728640)!
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:283)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95)
Caused by: org.apache.thrift.transport.TTransportException: Frame size
(352518912) larger than max length (15728640)!
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1327)
at
org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1315)
at
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:256)
... 2 more

This seems odd since the frame size thrift is asking for is over 336 MB.

This is happening using Cassandra 2.0.12 | Thrift protocol 19.39.0

Any advice?

Thanks!

--Jim

streaming errors with sstableloader

2018-12-06 Thread Ivan Iliev

Hello community,

I'm receiving some strange streaming errors while trying to restore certain
sstables snapshots with sstableloader to a new cluster.

While the cluster is up and running and nodes are communicating with
each other, I can see streams failing to the nodes with no obvious reason
and the only exception thrown is:

ERROR 14:00:08,403 [Stream #3d572210-f95f-11e8-bf2d-01149b1d085c] Streaming
error occurred on session with peer 10.35.81.88
java.lang.NullPointerException: null
   at
org.apache.cassandra.db.SerializationHeader$Component.access$400(SerializationHeader.java:271)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.db.SerializationHeader$Serializer.serialize(SerializationHeader.java:445)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:216)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:94)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:52)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:50)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:408)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
progress: [/10.35.81.88]0:0/3 0  % [/10.35.81.79]0:1/3 0  % [
cassandra01-test.sofia.elex.be/10.35.81.76]0:1/3 0  % total: 0% 2.652KiB/s
(avg: 2.652KiB/s)
progress: [/10.35.81.88]0:0/3 0  % [/10.35.81.79]0:1/3 0  % [
cassandra01-test.sofia.elex.be/10.35.81.76]0:1/3 0  % total: 0% 0.000KiB/s
(avg: 2.651KiB/s)
progress: [/10.35.81.88]0:0/3 0  % [/10.35.81.79]0:1/3 0  % [
cassandra01-test.sofia.elex.be/10.35.81.76]0:1/3 0  % total: 0% 0.000KiB/s
(avg: 2.650KiB/s)
ERROR 14:00:08,406 [Stream #3d572210-f95f-11e8-bf2d-01149b1d085c] Streaming
error occurred on session with peer 10.35.81.79
java.lang.NullPointerException: null
   at
org.apache.cassandra.db.SerializationHeader$Component.access$400(SerializationHeader.java:271)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.db.SerializationHeader$Serializer.serialize(SerializationHeader.java:445)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:216)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:94)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:52)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:50)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:408)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:380)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
progress: [/10.35.81.88]0:0/3 0  % [/10.35.81.79]0:1/3 0  % [
cassandra01-test.sofia.elex.be/10.35.81.76]0:1/3 0  % total: 0% 0.000KiB/s
(avg: 2.650KiB/s)
ERROR 14:00:08,407 [Stream #3d572210-f95f-11e8-bf2d-01149b1d085c] Remote
peer 10.35.81.88 failed stream session.
ERROR 14:00:08,408 [Stream #3d572210-f95f-11e8-bf2d-01149b1d085c] Streaming
error occurred on session with peer 10.35.81.76
java.lang.NullPointerException: null
   at
org.apache.cassandra.db.SerializationHeader$Component.access$400(SerializationHeader.java:271)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.db.SerializationHeader$Serializer.serialize(SerializationHeader.java:445)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.FileMessageHeader$FileMessageHeaderSerializer.serialize(FileMessageHeader.java:216)
~[apache-cassandra-3.11.3.jar:3.11.3]
   at
org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:94)
~[apache-cassandra-3.11.3.jar:3.11.3

Re: Problem with restoring a snapshot using sstableloader

2018-12-03 Thread Oleksandr Shulgin

On Mon, Dec 3, 2018 at 4:24 PM Oliver Herrmann 
wrote:

>
> You are right. The number of nodes in our cluster is equal to the
> replication factor. For that reason I think it should be sufficient to call
> sstableloader only from one node.
>

The next question is then: do you care about consistency of data restored
from one snapshot?  Is the snapshot taken after repair?  Do you still write
to those tables?

In other words, your data will be consistent after restoring from one
node's snapshot only if you were writing with consistency level ALL (or
equal to your replication factor and, transitively, to the number of nodes).

-- 
Oleksandr "Alex" Shulgin | Senior Software Engineer | Team Flux | Data
Services | Zalando SE | Tel: +49 176 127-59-707

Re: Problem with restoring a snapshot using sstableloader

2018-12-03 Thread Oliver Herrmann

Am So., 2. Dez. 2018 um 06:24 Uhr schrieb Oleksandr Shulgin <
oleksandr.shul...@zalando.de>:

> On Fri, 30 Nov 2018, 17:54 Oliver Herrmann 
>> When using nodetool refresh I must have write access to the data folder
>> and I have to do it on every node. In our production environment the user
>> that would do the restore does not have write access to the data folder.
>>
>
> OK, not entirely sure that's a reasonable setup, but do you imply that
> with sstableloader you don't need to process every snapshot taken -- that
> is, also visiting every node? That would only be true if your replication
> factor equals to the number of nodes, IMO.
>

You are right. The number of nodes in our cluster is equal to the
replication factor. For that reason I think it should be sufficient to call
sstableloader only from one node.

Re: Problem with restoring a snapshot using sstableloader

2018-12-02 Thread Alex Ott

It's a bug in the sstableloader introduced many years ago - before that, it
worked as described in documentation...

Oliver Herrmann  at "Fri, 30 Nov 2018 17:05:43 +0100" wrote:
 OH> Hi,

 OH> I'm having some problems to restore a snapshot using sstableloader. I'm 
using cassandra 3.11.1 and followed the instructions for
 OH> a creating and restoring from this page:
 OH> 
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/toolsBulkloader.html
 

 OH> 1. Called nodetool cleanup on each node
 OH> $ nodetool cleanup cass_testapp

 OH> 2. Called nodetool snapshot on each node
 OH> $ nodetool snapshot -t snap1 -kt cass_testapp.table3 

 OH> 3. Checked the data and snapshot folders:
 OH> $ ll 
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb
 OH> drwxr-xr-x 2 cassandra cassandra    6 Nov 29 03:54 backups
 OH> -rw-r--r-- 2 cassandra cassandra   43 Nov 30 10:21 
mc-11-big-CompressionInfo.db
 OH> -rw-r--r-- 2 cassandra cassandra  241 Nov 30 10:21 mc-11-big-Data.db
 OH> -rw-r--r-- 2 cassandra cassandra    9 Nov 30 10:21 mc-11-big-Digest.crc32
 OH> -rw-r--r-- 2 cassandra cassandra   16 Nov 30 10:21 mc-11-big-Filter.db
 OH> -rw-r--r-- 2 cassandra cassandra   21 Nov 30 10:21 mc-11-big-Index.db
 OH> -rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db
 OH> -rw-r--r-- 2 cassandra cassandra   95 Nov 30 10:21 mc-11-big-Summary.db
 OH> -rw-r--r-- 2 cassandra cassandra   92 Nov 30 10:21 mc-11-big-TOC.txt
 OH> drwxr-xr-x 3 cassandra cassandra   18 Nov 30 10:30 snapshots

 OH> and 

 OH> $ ll 
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/
 OH> total 44
 OH> -rw-r--r-- 1 cassandra cassandra   32 Nov 30 10:30 manifest.json
 OH> -rw-r--r-- 2 cassandra cassandra   43 Nov 30 10:21 
mc-11-big-CompressionInfo.db
 OH> -rw-r--r-- 2 cassandra cassandra  241 Nov 30 10:21 mc-11-big-Data.db
 OH> -rw-r--r-- 2 cassandra cassandra    9 Nov 30 10:21 mc-11-big-Digest.crc32
 OH> -rw-r--r-- 2 cassandra cassandra   16 Nov 30 10:21 mc-11-big-Filter.db
 OH> -rw-r--r-- 2 cassandra cassandra   21 Nov 30 10:21 mc-11-big-Index.db
 OH> -rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db
 OH> -rw-r--r-- 2 cassandra cassandra   95 Nov 30 10:21 mc-11-big-Summary.db
 OH> -rw-r--r-- 2 cassandra cassandra   92 Nov 30 10:21 mc-11-big-TOC.txt
 OH> -rw-r--r-- 1 cassandra cassandra 1043 Nov 30 10:30 schema.cql

 OH> 4. Truncated the table
 OH> cqlsh:cass_testapp> TRUNCATE table3 ;

 OH> 5. Tried to restore table3 on one cassandra node
 OH> $ sstableloader -d localhost 
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/
 OH> Established connection to initial hosts
 OH> Opening sstables and calculating sections to stream
 OH> Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist

 OH> Summary statistics: 
 OH>    Connections per host    : 1         
 OH>    Total files transferred : 0         
 OH>    Total bytes transferred : 0.000KiB  
 OH>    Total duration          : 2652 ms   
 OH>    Average transfer rate   : 0.000KiB/s
 OH>    Peak transfer rate      : 0.000KiB/s

 OH> I'm always getting the message "Skipping file mc-11-big-Data.db: table 
snapshots.table3 doesn't exist". I also tried to rename
 OH> the snapshots folder into the keyspace name (cass_testapp) but then I get 
the message "Skipping file mc-11-big-Data.db: table
 OH> snap1.snap1. doesn't exist".

 OH> What I'm doing wrong?

 OH> Thanks
 OH> Oliver



-- 
With best wishes,Alex Ott
Solutions Architect EMEA, DataStax
http://datastax.com/

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Problem with restoring a snapshot using sstableloader

2018-12-01 Thread Oleksandr Shulgin

On Fri, 30 Nov 2018, 17:54 Oliver Herrmann  When using nodetool refresh I must have write access to the data folder
> and I have to do it on every node. In our production environment the user
> that would do the restore does not have write access to the data folder.
>

OK, not entirely sure that's a reasonable setup, but do you imply that with
sstableloader you don't need to process every snapshot taken -- that is,
also visiting every node? That would only be true if your replication
factor equals to the number of nodes, IMO.

--
Alex

AW:Problem with restoring a snapshot using sstableloader

2018-11-30 Thread Oliver Herrmann

Thanks Dmitry, that solved my problem.Oliver  Originalnachricht Betreff: Re: Problem with restoring a snapshot using sstableloaderVon: Dmitry Saprykin An: user@cassandra.apache.orgCc: You need to move you files into directory named 'cass_testapp/table3/'. sstable loader uses 2 last path components as keyspace and table names.On Fri, Nov 30, 2018 at 11:54 AM Oliver Herrmann <o.herrmann...@gmail.com> wrote:When using nodetool refresh I must have write access to the data folder and I have to do it on every node. In our production environment the user that would do the restore does not have write access to the data folder.Am Fr., 30. Nov. 2018 um 17:39 Uhr schrieb Oleksandr Shulgin <oleksandr.shul...@zalando.de>:On Fri, Nov 30, 2018 at 5:13 PM Oliver Herrmann <o.herrmann...@gmail.com> wrote:I'm always getting the message "Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist". I also tried to rename the snapshots folder into the keyspace name (cass_testapp) but then I get the message "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".Hi,I imagine moving the files from snapshot directory to the data directory and then running `nodetool refresh` is the supported way.  Why use sstableloader for that?--Alex



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Problem with restoring a snapshot using sstableloader

2018-11-30 Thread Dmitry Saprykin

You need to move you files into directory named 'cass_testapp/table3/'.
sstable loader uses 2 last path components as keyspace and table names.

On Fri, Nov 30, 2018 at 11:54 AM Oliver Herrmann 
wrote:

> When using nodetool refresh I must have write access to the data folder
> and I have to do it on every node. In our production environment the user
> that would do the restore does not have write access to the data folder.
>
> Am Fr., 30. Nov. 2018 um 17:39 Uhr schrieb Oleksandr Shulgin <
> oleksandr.shul...@zalando.de>:
>
>> On Fri, Nov 30, 2018 at 5:13 PM Oliver Herrmann 
>> wrote:
>>
>>>
>>> I'm always getting the message "Skipping file mc-11-big-Data.db: table
>>> snapshots.table3 doesn't exist". I also tried to rename the snapshots
>>> folder into the keyspace name (cass_testapp) but then I get the message
>>> "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".
>>>
>>
>> Hi,
>>
>> I imagine moving the files from snapshot directory to the data directory
>> and then running `nodetool refresh` is the supported way.  Why use
>> sstableloader for that?
>>
>> --
>> Alex
>>
>>

Re: Problem with restoring a snapshot using sstableloader

2018-11-30 Thread Oliver Herrmann

When using nodetool refresh I must have write access to the data folder and
I have to do it on every node. In our production environment the user that
would do the restore does not have write access to the data folder.

Am Fr., 30. Nov. 2018 um 17:39 Uhr schrieb Oleksandr Shulgin <
oleksandr.shul...@zalando.de>:

> On Fri, Nov 30, 2018 at 5:13 PM Oliver Herrmann 
> wrote:
>
>>
>> I'm always getting the message "Skipping file mc-11-big-Data.db: table
>> snapshots.table3 doesn't exist". I also tried to rename the snapshots
>> folder into the keyspace name (cass_testapp) but then I get the message
>> "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".
>>
>
> Hi,
>
> I imagine moving the files from snapshot directory to the data directory
> and then running `nodetool refresh` is the supported way.  Why use
> sstableloader for that?
>
> --
> Alex
>
>

Re: Problem with restoring a snapshot using sstableloader

2018-11-30 Thread Oleksandr Shulgin

On Fri, Nov 30, 2018 at 5:13 PM Oliver Herrmann 
wrote:

>
> I'm always getting the message "Skipping file mc-11-big-Data.db: table
> snapshots.table3 doesn't exist". I also tried to rename the snapshots
> folder into the keyspace name (cass_testapp) but then I get the message
> "Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".
>

Hi,

I imagine moving the files from snapshot directory to the data directory
and then running `nodetool refresh` is the supported way.  Why use
sstableloader for that?

--
Alex

Problem with restoring a snapshot using sstableloader

2018-11-30 Thread Oliver Herrmann

Hi,

I'm having some problems to restore a snapshot using sstableloader. I'm
using cassandra 3.11.1 and followed the instructions for a creating and
restoring from this page:
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/toolsBulkloader.html


1. Called nodetool cleanup on each node
$ nodetool cleanup cass_testapp

2. Called nodetool snapshot on each node
$ nodetool snapshot -t snap1 -kt cass_testapp.table3

3. Checked the data and snapshot folders:
$ ll
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb
drwxr-xr-x 2 cassandra cassandra6 Nov 29 03:54 backups
-rw-r--r-- 2 cassandra cassandra   43 Nov 30 10:21
mc-11-big-CompressionInfo.db
-rw-r--r-- 2 cassandra cassandra  241 Nov 30 10:21 mc-11-big-Data.db
-rw-r--r-- 2 cassandra cassandra9 Nov 30 10:21 mc-11-big-Digest.crc32
-rw-r--r-- 2 cassandra cassandra   16 Nov 30 10:21 mc-11-big-Filter.db
-rw-r--r-- 2 cassandra cassandra   21 Nov 30 10:21 mc-11-big-Index.db
-rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db
-rw-r--r-- 2 cassandra cassandra   95 Nov 30 10:21 mc-11-big-Summary.db
-rw-r--r-- 2 cassandra cassandra   92 Nov 30 10:21 mc-11-big-TOC.txt
drwxr-xr-x 3 cassandra cassandra   18 Nov 30 10:30 snapshots

and

$ ll
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/
total 44
-rw-r--r-- 1 cassandra cassandra   32 Nov 30 10:30 manifest.json
-rw-r--r-- 2 cassandra cassandra   43 Nov 30 10:21
mc-11-big-CompressionInfo.db
-rw-r--r-- 2 cassandra cassandra  241 Nov 30 10:21 mc-11-big-Data.db
-rw-r--r-- 2 cassandra cassandra9 Nov 30 10:21 mc-11-big-Digest.crc32
-rw-r--r-- 2 cassandra cassandra   16 Nov 30 10:21 mc-11-big-Filter.db
-rw-r--r-- 2 cassandra cassandra   21 Nov 30 10:21 mc-11-big-Index.db
-rw-r--r-- 2 cassandra cassandra 4938 Nov 30 10:21 mc-11-big-Statistics.db
-rw-r--r-- 2 cassandra cassandra   95 Nov 30 10:21 mc-11-big-Summary.db
-rw-r--r-- 2 cassandra cassandra   92 Nov 30 10:21 mc-11-big-TOC.txt
-rw-r--r-- 1 cassandra cassandra 1043 Nov 30 10:30 schema.cql

4. Truncated the table
cqlsh:cass_testapp> TRUNCATE table3 ;

5. Tried to restore table3 on one cassandra node
$ sstableloader -d localhost
/var/lib/cassandra/data/cass_testapp/table3-7227e480f3b411e8941285913bce94cb/snapshots/snap1/
Established connection to initial hosts
Opening sstables and calculating sections to stream
Skipping file mc-11-big-Data.db: table snapshots.table3 doesn't exist

Summary statistics:
   Connections per host: 1
   Total files transferred : 0
   Total bytes transferred : 0.000KiB
   Total duration  : 2652 ms
   Average transfer rate   : 0.000KiB/s
   Peak transfer rate  : 0.000KiB/s

I'm always getting the message "Skipping file mc-11-big-Data.db: table
snapshots.table3 doesn't exist". I also tried to rename the snapshots
folder into the keyspace name (cass_testapp) but then I get the message
"Skipping file mc-11-big-Data.db: table snap1.snap1. doesn't exist".

What I'm doing wrong?

Thanks
Oliver

Re: Exception when running sstableloader

2018-11-26 Thread Alain RODRIGUEZ

Hello LAD,

I do not know much about the SSTable Loader. I carefully stayed away from
it so far :). But it seems it's using thrift to talk to Cassandra.

Some of your rows might be too big and increasing
'thrift_framed_transport_size_in_mb' should have helped indeed.

Did you / Would you try with increasing this as well:
'thrift_max_message_length_in_mb' and see what happens?

Cheers,
---
Alain Rodriguez - al...@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



Le lun. 5 nov. 2018 à 18:00, Kalyan Chakravarthy  a
écrit :

> I’m trying to migrate data between two clusters on different networks.
> Ports: 7001,7199,9046,9160 are open between them. But port:7000 is not
> open. When I run sstableloader command, got the following exception.
> Command:
>
> :/a/cassandra/bin# ./sstableloader -d
> 192.168.98.99/abc/cassandra/data/apps/ads-0fdd9ff0a7d711e89107ff9c3da22254
>
> Error/Exception:
>
> Could not retrieve endpoint ranges:
> org.apache.thrift.transport.TTransportException: *Frame size (352518912)
> larger than max length (15728640)!*
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:342)
> at
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:109)
> Caused by: org.apache.thrift.transport.TTransportException: Frame size
> (352518912) larger than max length (15728640)!
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
> at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1368)
> at
> org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1356)
> at
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:304)
> ... 2 more
>
>
>
>
> In yaml file,’ thrift_framed_transport_size_in_mb:’ is set to 15. So I
> have increased its value to 40. Even after increasing the
> ‘thrift_framed_transport_size_in_mb: ‘ in yaml file, I’m getting the same
> error.
>
> What could be the solution for this. Can somebody please help me with
> this??
>
> Cheers
> LAD
>

Exception when running sstableloader

2018-11-05 Thread Kalyan Chakravarthy

I’m trying to migrate data between two clusters on different networks. Ports: 
7001,7199,9046,9160 are open between them. But port:7000 is not open. When I 
run sstableloader command, got the following exception. 
Command:

:/a/cassandra/bin# ./sstableloader -d 
192.168.98.99/abc/cassandra/data/apps/ads-0fdd9ff0a7d711e89107ff9c3da22254

Error/Exception: 

Could not retrieve endpoint ranges:
org.apache.thrift.transport.TTransportException: Frame size (352518912) larger 
than max length (15728640)!
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:342)
at 
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:109)
Caused by: org.apache.thrift.transport.TTransportException: Frame size 
(352518912) larger than max length (15728640)!
at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1368)
at 
org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1356)
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:304)
... 2 more




In yaml file,’ thrift_framed_transport_size_in_mb:’ is set to 15. So I have 
increased its value to 40. Even after increasing the 
‘thrift_framed_transport_size_in_mb: ‘ in yaml file, I’m getting the same 
error. 

What could be the solution for this. Can somebody please help me with this??

Cheers 
LAD

Info about sstableloader

2018-11-05 Thread Kalyan Chakravarthy

Hi, 

I’m new to Cassandra, please help me with sstableloader. Thank you in advance. 

I’m trying to migrate data between two clusters which are on different networks.
 Migrating data from ‘c1’ to ‘c2’
Which one will be the source and which one will be destination?? 
And where should I run sstableloader command?? On c1 or c2??

Cheers 
LAD
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: [EXTERNAL] Re: Nodetool refresh v/s sstableloader

2018-08-30 Thread Rajath Subramanyam

Thank you, everyone, for responding.


Rajath Subramanyam



On Thu, Aug 30, 2018 at 8:38 AM Carl Mueller
 wrote:

> - Range aware compaction strategy that subdivides data by the token range
> could help for this: you only bakcup data for the primary node and not the
> replica data
> - yes, if you want to use nodetool refresh as some sort of recovery
> solution, MAKE SURE YOU STORE THE TOKEN LIST with the
> sstables/snapshots/backups for the nodes.
>
> On Wed, Aug 29, 2018 at 8:57 AM Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> Sstableloader, though, could require a lot more disk space – until
>> compaction can reduce. For example, if your RF=3, you will essentially be
>> loading 3 copies of the data. Then it will get replicated 3 more times as
>> it is being loaded. Thus, you could need up to 9x disk space.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>> *From:* kurt greaves 
>> *Sent:* Wednesday, August 29, 2018 7:26 AM
>> *To:* User 
>> *Subject:* [EXTERNAL] Re: Nodetool refresh v/s sstableloader
>>
>>
>>
>> Removing dev...
>>
>> Nodetool refresh only picks up new SSTables that have been placed in the
>> tables directory. It doesn't account for actual ownership of the data like
>> SSTableloader does. Refresh will only work properly if the SSTables you are
>> copying in are completely covered by that nodes tokens. It doesn't work if
>> there's a change in topology, replication and token ownership will have to
>> be more or less the same.
>>
>>
>>
>> SSTableloader will break up the SSTables and send the relevant bits to
>> whichever node needs it, so no need for you to worry about tokens and
>> copying data to the right places, it will do that for you.
>>
>>
>>
>> On 28 August 2018 at 11:27, Rajath Subramanyam 
>> wrote:
>>
>> Hi Cassandra users, Cassandra dev,
>>
>>
>>
>> When recovering using SSTables from a snapshot, I want to know what are
>> the key differences between using:
>>
>> 1. Nodetool refresh and,
>>
>> 2. SSTableloader
>>
>>
>>
>> Does nodetool refresh have restrictions that need to be met?
>> Does nodetool refresh work even if there is a change in the topology
>> between the source cluster and the destination cluster? Does it work if the
>> token ranges don't match between the source cluster and the destination
>> cluster? Does it work when an old SSTable in the snapshot has a dropped
>> column that is not part of the current schema?
>>
>>
>>
>> I appreciate any help in advance.
>>
>>
>>
>> Thanks,
>>
>> Rajath
>>
>> 
>>
>> Rajath Subramanyam
>>
>>
>>
>>
>>
>> --
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>

Re: [EXTERNAL] Re: Nodetool refresh v/s sstableloader

2018-08-30 Thread Carl Mueller

- Range aware compaction strategy that subdivides data by the token range
could help for this: you only bakcup data for the primary node and not the
replica data
- yes, if you want to use nodetool refresh as some sort of recovery
solution, MAKE SURE YOU STORE THE TOKEN LIST with the
sstables/snapshots/backups for the nodes.

On Wed, Aug 29, 2018 at 8:57 AM Durity, Sean R 
wrote:

> Sstableloader, though, could require a lot more disk space – until
> compaction can reduce. For example, if your RF=3, you will essentially be
> loading 3 copies of the data. Then it will get replicated 3 more times as
> it is being loaded. Thus, you could need up to 9x disk space.
>
>
>
>
>
> Sean Durity
>
> *From:* kurt greaves 
> *Sent:* Wednesday, August 29, 2018 7:26 AM
> *To:* User 
> *Subject:* [EXTERNAL] Re: Nodetool refresh v/s sstableloader
>
>
>
> Removing dev...
>
> Nodetool refresh only picks up new SSTables that have been placed in the
> tables directory. It doesn't account for actual ownership of the data like
> SSTableloader does. Refresh will only work properly if the SSTables you are
> copying in are completely covered by that nodes tokens. It doesn't work if
> there's a change in topology, replication and token ownership will have to
> be more or less the same.
>
>
>
> SSTableloader will break up the SSTables and send the relevant bits to
> whichever node needs it, so no need for you to worry about tokens and
> copying data to the right places, it will do that for you.
>
>
>
> On 28 August 2018 at 11:27, Rajath Subramanyam  wrote:
>
> Hi Cassandra users, Cassandra dev,
>
>
>
> When recovering using SSTables from a snapshot, I want to know what are
> the key differences between using:
>
> 1. Nodetool refresh and,
>
> 2. SSTableloader
>
>
>
> Does nodetool refresh have restrictions that need to be met?
> Does nodetool refresh work even if there is a change in the topology
> between the source cluster and the destination cluster? Does it work if the
> token ranges don't match between the source cluster and the destination
> cluster? Does it work when an old SSTable in the snapshot has a dropped
> column that is not part of the current schema?
>
>
>
> I appreciate any help in advance.
>
>
>
> Thanks,
>
> Rajath
>
> 
>
> Rajath Subramanyam
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] Re: Nodetool refresh v/s sstableloader

2018-08-29 Thread Durity, Sean R

Sstableloader, though, could require a lot more disk space – until compaction 
can reduce. For example, if your RF=3, you will essentially be loading 3 copies 
of the data. Then it will get replicated 3 more times as it is being loaded. 
Thus, you could need up to 9x disk space.

Sean Durity
From: kurt greaves 
Sent: Wednesday, August 29, 2018 7:26 AM
To: User 
Subject: [EXTERNAL] Re: Nodetool refresh v/s sstableloader

Removing dev...
Nodetool refresh only picks up new SSTables that have been placed in the tables 
directory. It doesn't account for actual ownership of the data like 
SSTableloader does. Refresh will only work properly if the SSTables you are 
copying in are completely covered by that nodes tokens. It doesn't work if 
there's a change in topology, replication and token ownership will have to be 
more or less the same.

SSTableloader will break up the SSTables and send the relevant bits to 
whichever node needs it, so no need for you to worry about tokens and copying 
data to the right places, it will do that for you.

On 28 August 2018 at 11:27, Rajath Subramanyam 
mailto:rajat...@gmail.com>> wrote:
Hi Cassandra users, Cassandra dev,

When recovering using SSTables from a snapshot, I want to know what are the key 
differences between using:
1. Nodetool refresh and,
2. SSTableloader

Does nodetool refresh have restrictions that need to be met? Does nodetool 
refresh work even if there is a change in the topology between the source 
cluster and the destination cluster? Does it work if the token ranges don't 
match between the source cluster and the destination cluster? Does it work when 
an old SSTable in the snapshot has a dropped column that is not part of the 
current schema?

I appreciate any help in advance.

Thanks,
Rajath

Rajath Subramanyam

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: Nodetool refresh v/s sstableloader

2018-08-29 Thread kurt greaves

Removing dev...
Nodetool refresh only picks up new SSTables that have been placed in the
tables directory. It doesn't account for actual ownership of the data like
SSTableloader does. Refresh will only work properly if the SSTables you are
copying in are completely covered by that nodes tokens. It doesn't work if
there's a change in topology, replication and token ownership will have to
be more or less the same.

SSTableloader will break up the SSTables and send the relevant bits to
whichever node needs it, so no need for you to worry about tokens and
copying data to the right places, it will do that for you.

On 28 August 2018 at 11:27, Rajath Subramanyam  wrote:

> Hi Cassandra users, Cassandra dev,
>
> When recovering using SSTables from a snapshot, I want to know what are
> the key differences between using:
> 1. Nodetool refresh and,
> 2. SSTableloader
>
> Does nodetool refresh have restrictions that need to be met?
> Does nodetool refresh work even if there is a change in the topology
> between the source cluster and the destination cluster? Does it work if the
> token ranges don't match between the source cluster and the destination
> cluster? Does it work when an old SSTable in the snapshot has a dropped
> column that is not part of the current schema?
>
> I appreciate any help in advance.
>
> Thanks,
> Rajath
> 
> Rajath Subramanyam
>
>

Nodetool refresh v/s sstableloader

2018-08-27 Thread Rajath Subramanyam

Hi Cassandra users, Cassandra dev,

When recovering using SSTables from a snapshot, I want to know what are the
key differences between using:
1. Nodetool refresh and,
2. SSTableloader

Does nodetool refresh have restrictions that need to be met?
Does nodetool refresh work even if there is a change in the topology
between the source cluster and the destination cluster? Does it work if the
token ranges don't match between the source cluster and the destination
cluster? Does it work when an old SSTable in the snapshot has a dropped
column that is not part of the current schema?

I appreciate any help in advance.

Thanks,
Rajath

Rajath Subramanyam

Re: Cassandra crashes after loading data with sstableloader

2018-07-29 Thread Jeff Jirsa

What’s the cardinality of hash? 

Do they have the same schema? If so you may be able to take a snapshot and 
hardlink it in / refresh instead of sstableloader. Alternatively you could drop 
the index from the destination keyspace and add it back in after the load 
finishes.

How big are the sstables? How big is your heap? Are you already serving 
traffic? 

-- 
Jeff Jirsa


> On Jul 29, 2018, at 3:43 PM, Rahul Singh  wrote:
> 
> What does “hash” Data look like?
> 
> Rahul
>> On Jul 24, 2018, 11:30 AM -0400, Arpan Khandelwal , 
>> wrote:
>> I need to clone data from one keyspace to another keyspace.
>> We do it by taking snapshot of keyspace1 and restoring in keyspace2 using 
>> sstableloader.
>> 
>> Suppose we have following table with index on hash column. Table has around 
>> 10M rows.
>> -
>> CREATE TABLE message (
>>  id uuid,
>>  messageid uuid,
>>  parentid uuid,
>>  label text,
>>  properties map,
>>  text1 text,
>>  text2 text,
>>  text3 text,
>>  category text,
>>  hash text,
>>  info map,
>>  creationtimestamp bigint,
>>  lastupdatedtimestamp bigint,
>>  PRIMARY KEY ( (id) )
>>  );
>> 
>> CREATE  INDEX  ON message ( hash );
>> -
>> Cassandra crashes when i load data using sstableloader. Load is happening 
>> correctly but seems that cassandra crashes when its trying to build index on 
>> table with huge data.
>> 
>> I have two questions.
>> 1. Is there any better way to clone keyspace?
>> 2. How can i optimize sstableloader to load data and not crash cassandra 
>> while building index.
>> 
>> Thanks
>> Arpan

Re: Cassandra crashes after loading data with sstableloader

2018-07-29 Thread Rahul Singh

What does “hash” Data look like?

Rahul
On Jul 24, 2018, 11:30 AM -0400, Arpan Khandelwal , wrote:
> I need to clone data from one keyspace to another keyspace.
> We do it by taking snapshot of keyspace1 and restoring in keyspace2 using 
> sstableloader.
>
> Suppose we have following table with index on hash column. Table has around 
> 10M rows.
> -
> CREATE TABLE message (
>  id     uuid,
>  messageid     uuid,
>  parentid     uuid,
>  label     text,
>  properties     map,
>  text1     text,
>  text2     text,
>  text3     text,
>  category     text,
>  hash     text,
>  info     map,
>  creationtimestamp     bigint,
>  lastupdatedtimestamp     bigint,
>  PRIMARY KEY ( (id) )
>  );
>
> CREATE  INDEX  ON message ( hash );
> -
> Cassandra crashes when i load data using sstableloader. Load is happening 
> correctly but seems that cassandra crashes when its trying to build index on 
> table with huge data.
>
> I have two questions.
> 1. Is there any better way to clone keyspace?
> 2. How can i optimize sstableloader to load data and not crash cassandra 
> while building index.
>
> Thanks
> Arpan

Cassandra crashes after loading data with sstableloader

2018-07-24 Thread Arpan Khandelwal

I need to clone data from one keyspace to another keyspace.
We do it by taking snapshot of keyspace1 and restoring in keyspace2 using
sstableloader.

Suppose we have following table with index on hash column. Table has around
10M rows.
-
CREATE TABLE message (
 id uuid,
 messageid uuid,
 parentid uuid,
 label text,
 properties map,
 text1 text,
 text2 text,
 text3 text,
 category text,
 hash text,
 info map,
 creationtimestamp bigint,
 lastupdatedtimestamp bigint,
 PRIMARY KEY ( (id) )
 );

CREATE  INDEX  ON message ( hash );
-
Cassandra crashes when i load data using sstableloader. Load is happening
correctly but seems that cassandra crashes when its trying to build index
on table with huge data.

I have two questions.
1. Is there any better way to clone keyspace?
2. How can i optimize sstableloader to load data and not crash cassandra
while building index.

Thanks
Arpan

Re: sstableloader from dse 4.8.4 to apache cassandra 3.11.1

2018-06-19 Thread rajpal reddy

Never mind found it. its not a supported version.

> On Jun 19, 2018, at 2:41 PM, rajpal reddy  wrote:
> 
> 
> Hello,
> 
> I’m trying to use sstablloader from dse 4.8.4( 2.1.12) to apache 3.11.1, i’m 
> getting below error. but works fine when i use stableloader dse 5.1.2(apache 
> 3.11.0)
> Could not retrieve endpoint ranges: 
> java.io.IOException: Failed to open transport to: host-ip:9160.
> 
> Any work around to use the stable loader from use 4.8.4(apache 2.1.12) to 
> apache 3.11.1
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

sstableloader from dse 4.8.4 to apache cassandra 3.11.1

2018-06-19 Thread rajpal reddy



Hello,

I’m trying to use sstablloader from dse 4.8.4( 2.1.12) to apache 3.11.1, i’m 
getting below error. but works fine when i use stableloader dse 5.1.2(apache 
3.11.0)
Could not retrieve endpoint ranges: 
java.io.IOException: Failed to open transport to: host-ip:9160.

Any work around to use the stable loader from use 4.8.4(apache 2.1.12) to 
apache 3.11.1


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: SSTableLoader Question

2018-02-19 Thread shalom sagges

Sounds good.

Thanks for the explanation!

On Sun, Feb 18, 2018 at 5:15 PM, Rahul Singh <rahul.xavier.si...@gmail.com>
wrote:

> If you don’t have access to the file you don’t have access to the file.
> I’ve seen this issue several times. It’s he easiest low hanging fruit to
> resolve. So figure it out and make sure that it’s Cassandra.Cassandra from
> root to he Data folder and either run as root or sudo it.
>
> If it’s compacted it won’t be there so you won’t have the file. I’m not
> aware of this event being communicated to Sstableloader via SEDA. Besides,
> the sstable that you are loading SHOULD not be live. If you at streaming a
> life sstable, it means you are using sstableloader not as it is designed to
> be used - which is with static files.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 18, 2018, 9:22 AM -0500, shalom sagges <shalomsag...@gmail.com>,
> wrote:
>
> Not really sure with which user I ran it (root or cassandra), although I
> don't understand why a permission issue will generate a File not Found
> exception?
>
> And in general, what if a file is being streamed and got compacted before
> the streaming ended. Does Cassandra know how to handle this?
>
> Thanks!
>
> On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh <rahul.xavier.si...@gmail.com
> > wrote:
>
>> Check permissions maybe? Who owns the files vs. who is running
>> sstableloader.
>>
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>>
>> Anant Corporation
>>
>> On Feb 18, 2018, 4:26 AM -0500, shalom sagges <shalomsag...@gmail.com>,
>> wrote:
>>
>> Hi All,
>>
>> C* version 2.0.14.
>>
>> I was loading some data to another cluster using SSTableLoader. The
>> streaming failed with the following error:
>>
>>
>> Streaming error occurred
>> java.lang.RuntimeException: java.io.*FileNotFoundException*:
>> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file
>> or directory)
>> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
>> r.open(CompressedRandomAccessReader.java:59)
>> at org.apache.cassandra.io.sstable.SSTableReader.openDataReader
>> (SSTableReader.java:1409)
>> at org.apache.cassandra.streaming.compress.CompressedStreamWrit
>> er.write(CompressedStreamWriter.java:55)
>> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
>> 1.serialize(OutgoingFileMessage.java:59)
>> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
>> 1.serialize(OutgoingFileMessage.java:42)
>> at org.apache.cassandra.streaming.messages.StreamMessage.serial
>> ize(StreamMessage.java:45)
>> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
>> sageHandler.sendMessage(ConnectionHandler.java:339)
>> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
>> sageHandler.run(ConnectionHandler.java:311)
>> at java.lang.Thread.run(Thread.java:722)
>> Caused by: java.io.*FileNotFoundException*:
>> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file
>> or directory)
>> at java.io.RandomAccessFile.open(Native Method)
>> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
>> at org.apache.cassandra.io.util.RandomAccessReader.(Rando
>> mAccessReader.java:58)
>> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
>> r.(CompressedRandomAccessReader.java:76)
>> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
>> r.open(CompressedRandomAccessReader.java:55)
>> ... 8 more
>>  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream
>> failed
>>
>>
>>
>> Did I miss something when running the load? Was the file suddenly missing
>> due to compaction?
>> If so, did I need to disable auto compaction or stop the service
>> beforehand? (didn't find any reference to compaction in the docs)
>>
>> I know it's an old version, but I didn't find any related bugs on "File
>> not found" exceptions.
>>
>> Thanks!
>>
>>
>>
>

Re: SSTableLoader Question

2018-02-18 Thread Rahul Singh

If you don’t have access to the file you don’t have access to the file. I’ve 
seen this issue several times. It’s he easiest low hanging fruit to resolve. So 
figure it out and make sure that it’s Cassandra.Cassandra from root to he Data 
folder and either run as root or sudo it.

If it’s compacted it won’t be there so you won’t have the file. I’m not aware 
of this event being communicated to Sstableloader via SEDA. Besides, the 
sstable that you are loading SHOULD not be live. If you at streaming a life 
sstable, it means you are using sstableloader not as it is designed to be used 
- which is with static files.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 9:22 AM -0500, shalom sagges <shalomsag...@gmail.com>, wrote:
> Not really sure with which user I ran it (root or cassandra), although I 
> don't understand why a permission issue will generate a File not Found 
> exception?
>
> And in general, what if a file is being streamed and got compacted before the 
> streaming ended. Does Cassandra know how to handle this?
>
> Thanks!
>
> > On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh <rahul.xavier.si...@gmail.com> 
> > wrote:
> > > Check permissions maybe? Who owns the files vs. who is running 
> > > sstableloader.
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Feb 18, 2018, 4:26 AM -0500, shalom sagges <shalomsag...@gmail.com>, 
> > > wrote:
> > > > Hi All,
> > > >
> > > > C* version 2.0.14.
> > > >
> > > > I was loading some data to another cluster using SSTableLoader. The 
> > > > streaming failed with the following error:
> > > >
> > > >
> > > > Streaming error occurred
> > > > java.lang.RuntimeException: java.io.FileNotFoundException: 
> > > > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file 
> > > > or directory)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59)
> > > >     at 
> > > > org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1409)
> > > >     at 
> > > > org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:55)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
> > > >     at 
> > > > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
> > > >     at 
> > > > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)
> > > >     at java.lang.Thread.run(Thread.java:722)
> > > > Caused by: java.io.FileNotFoundException: 
> > > > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file 
> > > > or directory)
> > > >     at java.io.RandomAccessFile.open(Native Method)
> > > >     at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> > > >     at 
> > > > org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:76)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:55)
> > > >     ... 8 more
> > > >  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] 
> > > > Stream failed
> > > >
> > > >
> > > >
> > > > Did I miss something when running the load? Was the file suddenly 
> > > > missing due to compaction?
> > > > If so, did I need to disable auto compaction or stop the service 
> > > > beforehand? (didn't find any reference to compaction in the docs)
> > > >
> > > > I know it's an old version, but I didn't find any related bugs on "File 
> > > > not found" exceptions.
> > > >
> > > > Thanks!
> > > >
> > > >
>

Re: SSTableLoader Question

2018-02-18 Thread shalom sagges

Not really sure with which user I ran it (root or cassandra), although I
don't understand why a permission issue will generate a File not Found
exception?

And in general, what if a file is being streamed and got compacted before
the streaming ended. Does Cassandra know how to handle this?

Thanks!

On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh <rahul.xavier.si...@gmail.com>
wrote:

> Check permissions maybe? Who owns the files vs. who is running
> sstableloader.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 18, 2018, 4:26 AM -0500, shalom sagges <shalomsag...@gmail.com>,
> wrote:
>
> Hi All,
>
> C* version 2.0.14.
>
> I was loading some data to another cluster using SSTableLoader. The
> streaming failed with the following error:
>
>
> Streaming error occurred
> java.lang.RuntimeException: java.io.*FileNotFoundException*:
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file
> or directory)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.open(CompressedRandomAccessReader.java:59)
> at org.apache.cassandra.io.sstable.SSTableReader.openDataReader
> (SSTableReader.java:1409)
> at org.apache.cassandra.streaming.compress.CompressedStreamWrit
> er.write(CompressedStreamWriter.java:55)
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
> 1.serialize(OutgoingFileMessage.java:59)
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
> 1.serialize(OutgoingFileMessage.java:42)
> at org.apache.cassandra.streaming.messages.StreamMessage.
> serialize(StreamMessage.java:45)
> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
> sageHandler.sendMessage(ConnectionHandler.java:339)
> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
> sageHandler.run(ConnectionHandler.java:311)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.*FileNotFoundException*: /data1/keyspace1/table1/
> keyspace1-table1-jb-65174-Data.db (No such file or directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> at org.apache.cassandra.io.util.RandomAccessReader.(Rando
> mAccessReader.java:58)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.(CompressedRandomAccessReader.java:76)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.open(CompressedRandomAccessReader.java:55)
> ... 8 more
>  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream
> failed
>
>
>
> Did I miss something when running the load? Was the file suddenly missing
> due to compaction?
> If so, did I need to disable auto compaction or stop the service
> beforehand? (didn't find any reference to compaction in the docs)
>
> I know it's an old version, but I didn't find any related bugs on "File
> not found" exceptions.
>
> Thanks!
>
>
>

Re: SSTableLoader Question

2018-02-18 Thread Rahul Singh

Check permissions maybe? Who owns the files vs. who is running sstableloader.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 4:26 AM -0500, shalom sagges <shalomsag...@gmail.com>, wrote:
> Hi All,
>
> C* version 2.0.14.
>
> I was loading some data to another cluster using SSTableLoader. The streaming 
> failed with the following error:
>
>
> Streaming error occurred
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or 
> directory)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59)
>     at 
> org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1409)
>     at 
> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:55)
>     at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
>     at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>     at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>     at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>     at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)
>     at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.FileNotFoundException: 
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or 
> directory)
>     at java.io.RandomAccessFile.open(Native Method)
>     at java.io.RandomAccessFile.(RandomAccessFile.java:233)
>     at 
> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:76)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:55)
>     ... 8 more
>  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream 
> failed
>
>
>
> Did I miss something when running the load? Was the file suddenly missing due 
> to compaction?
> If so, did I need to disable auto compaction or stop the service beforehand? 
> (didn't find any reference to compaction in the docs)
>
> I know it's an old version, but I didn't find any related bugs on "File not 
> found" exceptions.
>
> Thanks!
>
>

SSTableLoader Question

2018-02-18 Thread shalom sagges

Hi All,

C* version 2.0.14.

I was loading some data to another cluster using SSTableLoader. The
streaming failed with the following error:


Streaming error occurred
java.lang.RuntimeException: java.io.*FileNotFoundException*:
/data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or
directory)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.
open(CompressedRandomAccessReader.java:59)
at org.apache.cassandra.io.sstable.SSTableReader.
openDataReader(SSTableReader.java:1409)
at org.apache.cassandra.streaming.compress.
CompressedStreamWriter.write(CompressedStreamWriter.java:55)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
serialize(OutgoingFileMessage.java:59)
at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
serialize(OutgoingFileMessage.java:42)
at org.apache.cassandra.streaming.messages.StreamMessage.serialize(
StreamMessage.java:45)
at org.apache.cassandra.streaming.ConnectionHandler$
OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
at org.apache.cassandra.streaming.ConnectionHandler$
OutgoingMessageHandler.run(ConnectionHandler.java:311)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.*FileNotFoundException*:
/data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or
directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at org.apache.cassandra.io.util.RandomAccessReader.(
RandomAccessReader.java:58)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<
init>(CompressedRandomAccessReader.java:76)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.
open(CompressedRandomAccessReader.java:55)
... 8 more
 WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream
failed



Did I miss something when running the load? Was the file suddenly missing
due to compaction?
If so, did I need to disable auto compaction or stop the service
beforehand? (didn't find any reference to compaction in the docs)

I know it's an old version, but I didn't find any related bugs on "File not
found" exceptions.

Thanks!

Trouble restoring with sstableloader

2017-08-01 Thread Jim Miller

Hi all,

I've been running into the following issue while trying to restore a C*
database via sstableloader:

Could not retrieve endpoint ranges:
org.apache.thrift.transport.TTransportException: Frame size (352518912)
larger than max length (15728640)!
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:283)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:95)
Caused by: org.apache.thrift.transport.TTransportException: Frame size
(352518912) larger than max length (15728640)!
at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1327)
at
org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1315)
at
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:256)
... 2 more

This seems odd since the frame size thrift is asking for is over 336 MB.

This is happening using Cassandra 2.0.12 | Thrift protocol 19.39.0

Any advice?

Thanks!

--Jim

sstableloader out of memory

2017-07-25 Thread Nathan Jackels

Hi all,

We're trying to load a snapshot back into a cluster, but are running into
memory issues.
We've got about 190GB of data across 11 sstable-generations. Some of the
smaller ones load, but the larger ones aren't.
We've tried increasing the max-heap-size to 16G, but stil see this
exception:

sstableloader -d cass1
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19968-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19930-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19966-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19960-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19944-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-9639-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19964-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-18879-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19965-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19967-Data.db
/snapshot_data/keyspace1/cf1-2195c1a0bc1011e69b699bbcfdee6372/keyspace1-cf1-ka-19959-Data.db
to []
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
exceeded
at
org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:257)
at
org.apache.cassandra.streaming.messages.OutgoingFileMessage.(OutgoingFileMessage.java:70)
at
org.apache.cassandra.streaming.StreamTransferTask.addTransferFile(StreamTransferTask.java:58)
at
org.apache.cassandra.streaming.StreamSession.addTransferFiles(StreamSession.java:378)
at
org.apache.cassandra.streaming.StreamCoordinator.transferFiles(StreamCoordinator.java:147)
at
org.apache.cassandra.streaming.StreamPlan.transferFiles(StreamPlan.java:144)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:185)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)

Has anyone run into this before? The next steps we're going to try are
running sstableloader on each generation individually (suspecting that it's
trying to open all 11 generations at the same time). If that doesn't work
we'll try sstablesplit, but aren't that confident that would do anything
since it probably uses the same code to read the sstables as sstableloader
and also run out of memory.

Thanks,
Nathan

sstableloader limitations in multi-dc cluster

2017-06-22 Thread Mike Torra

I'm trying to use sstableloader to bulk load some data to my 4 DC cluster,
and I can't quite get it to work. Here is how I'm trying to run it:

sstableloader -d 127.0.0.1 -i {csv list of private ips of nodes in cluster}
myks/mttest


At first this seems to work, with a steady stream of logging like this
(eventually getting to 100%):

progress: [/10.0.1.225]0:13/13 100% [/10.0.0.134]0:13/13 100%
[/10.0.0.119]0:13/13
100% [/10.0.1.26]0:13/13 100% [/10.0.3.188]0:13/13 100% [/10.0.3.189]0:13/13
100% [/10.0.2.95]0:13/13 100% total: 100% 0.000KiB/s (avg: 13.857MiB/s)


There will be some errors sprinkled in like this:

ERROR 15:35:43 [Stream #707f0920-5760-11e7-8ede-37de75ac1efa] Streaming
error occurred on session with peer 10.0.2.9

java.net.NoRouteToHostException: No route to host


Then, at the end, there will be one last warning about the failed streams:

WARN  15:38:03 [Stream #707f0920-5760-11e7-8ede-37de75ac1efa] Stream failed

Streaming to the following hosts failed:

[/127.0.0.1, {list of same private ips as above}]


I am perplexed about the failures because I am trying to explicitly ignore
the nodes in remote DC's via the -i option to sstableloader. Why doesn't
this work? I've tried using the public IP's instead just for kicks, but
that doesn't change anything. I don't see anything helpful in the cassandra
logs (including debug logs). Also, why is localhost in the list of
failures? I can query the data locally after the sstableloader command
completes.

I've also noticed that sstableloader fails completely (even locally) while
I am decomissioning or bootstrapping a node in a remote DC. Is this a
limitation of sstableloader? I haven't been able to find documentation
about this.

Re: sstableloader making no progress

2017-02-14 Thread Simone Franzini

Adding to the above, each host shows the following log messages that,
despite being at INFO level, appear like stack traces to me:

2017-02-13 15:09:22,166 INFO  [STREAM-INIT-/10.128.X.Y:60306]
 StreamResultFuture.java:116 - [Stream
#afe548d0-f230-11e6-bc5d-8f99f25bfcf7, ID#0] Received streaming plan for
Bulk Load
at clojure.lang.Var.invoke(Var.java:401)
at
opsagent.config_service$update_system$fn__20140.invoke(config_service.clj:205)
at clojure.core$reduce.invoke(core.clj:6518)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at
opsagent.config_service$fn__20217$fn__20218$state_machine__4128__auto20219$fn__20221.invoke(config_service.clj:250)
at
clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:940)
at clojure.core.async$ioc_alts_BANG_$fn__4293.invoke(async.clj:362)
at
clojure.core.async.impl.channels.ManyToManyChannel$fn__624.invoke(channels.clj:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.lang.Thread.run(Thread.java:745)

2017-02-13 15:09:22,208 INFO  [STREAM-IN-/10.128.X.Y]
 StreamResultFuture.java:166 - [Stream
#afe548d0-f230-11e6-bc5d-8f99f25bfcf7 ID#0] Prepare completed. Receiving 3
files(3963 bytes), sending 0 files(0 bytes)
at clojure.lang.ArraySeq.reduce(ArraySeq.java:114)
at opsagent.config_service$update_system.doInvoke(config_service.clj:199)
at opsagent.config_service$start_system_BANG_.invoke(config_service.clj:224)
at
opsagent.config_service$fn__20217$fn__20218$state_machine__4128__auto20219.invoke(config_service.clj:247)
at
clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:944)
at clojure.core.async$do_alts$fn__4247$fn__4250.invoke(async.clj:231)
at clojure.lang.AFn.run(AFn.java:22)

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Fri, Feb 10, 2017 at 4:28 PM, Simone Franzini <captainfr...@gmail.com>
wrote:

> I am trying to ingest some data from a cluster to a different cluster via
> sstableloader. I am running DSE 4.8.7 / Cassandra 2.1.14.
> I have re-created the schemas and followed other instructions here:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsBulkloader_t.html
>
> I am initially testing the ingest process with a single table, containing
> 3 really small sstables (just a few KB each):
> sstableloader -v -d  /
> From the console, it appears that the progress quickly reaches 100%, but
> the command never returns:
> progress: [/10.128.X.Y]0:3/3 100% [/10.192.Z.W]0:3/3 100% ... total: 100%
> 0  MB/s(avg: 0 MB/s)
>
> nodetool netstats shows that there is no progress:
> Mode: NORMAL
> Bulk Load e495cea0-efde-11e6-9ec0-8f99f25bfcf7
> /10.128.X.Y
> Receiving 3 files, 3963 bytes total. Already received 0 files, 0
> bytes total
> Bulk Load b2566980-efb7-11e6-a467-8f99f25bfcf7
> /10.128.X.Y
> Receiving 3 files, 3963 bytes total. Already received 0 files, 0
> bytes total
> Bulk Load f31e7810-efdd-11e6-8484-8f99f25bfcf7
> /10.128.X.Y
> Receiving 3 files, 3963 bytes total. Already received 0 files, 0
> bytes total
> ...
> Read Repair Statistics:
> Attempted: 8
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed
> Commandsn/a 02148112
> Responses   n/a 0 977176
>
>
> The logs show the following, but no error or warning message:
> 2017-02-10 16:18:49,096 INFO  [STREAM-INIT-/10.128.X.Y:33302]
>  StreamResultFuture.java:109 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7
> ID#0] Creating new streaming plan for Bulk Load
> 2017-02-10 16:18:49,105 INFO  [STREAM-INIT-/10.128.X.Y:33302]
>  StreamResultFuture.java:116 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7,
> ID#0] Received streaming plan for Bulk Load
> 2017-02-10 16:18:49,110 INFO  [STREAM-INIT-/10.128.X.Y:33306]
>  StreamResultFuture.java:116 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7,
> ID#0] Received streaming plan for Bulk Load
> 2017-02-10 16:18:49,110 INFO  [STREAM-IN-/10.128.X.Y]
>  StreamResultFuture.java:166 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7
> ID#0] Prepare completed. Receiving 3 files(3963 bytes), sending 0 files(0
> bytes)
>
>
> Any help would be greatly appreciated.
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>

sstableloader making no progress

2017-02-10 Thread Simone Franzini

I am trying to ingest some data from a cluster to a different cluster via
sstableloader. I am running DSE 4.8.7 / Cassandra 2.1.14.
I have re-created the schemas and followed other instructions here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html

I am initially testing the ingest process with a single table, containing 3
really small sstables (just a few KB each):
sstableloader -v -d  /
>From the console, it appears that the progress quickly reaches 100%, but
the command never returns:
progress: [/10.128.X.Y]0:3/3 100% [/10.192.Z.W]0:3/3 100% ... total: 100% 0
 MB/s(avg: 0 MB/s)

nodetool netstats shows that there is no progress:
Mode: NORMAL
Bulk Load e495cea0-efde-11e6-9ec0-8f99f25bfcf7
/10.128.X.Y
Receiving 3 files, 3963 bytes total. Already received 0 files, 0
bytes total
Bulk Load b2566980-efb7-11e6-a467-8f99f25bfcf7
/10.128.X.Y
Receiving 3 files, 3963 bytes total. Already received 0 files, 0
bytes total
Bulk Load f31e7810-efdd-11e6-8484-8f99f25bfcf7
/10.128.X.Y
Receiving 3 files, 3963 bytes total. Already received 0 files, 0
bytes total
...
Read Repair Statistics:
Attempted: 8
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 02148112
Responses   n/a 0 977176


The logs show the following, but no error or warning message:
2017-02-10 16:18:49,096 INFO  [STREAM-INIT-/10.128.X.Y:33302]
 StreamResultFuture.java:109 - [Stream
#e495cea0-efde-11e6-9ec0-8f99f25bfcf7 ID#0] Creating new streaming plan for
Bulk Load
2017-02-10 16:18:49,105 INFO  [STREAM-INIT-/10.128.X.Y:33302]
 StreamResultFuture.java:116 - [Stream
#e495cea0-efde-11e6-9ec0-8f99f25bfcf7, ID#0] Received streaming plan for
Bulk Load
2017-02-10 16:18:49,110 INFO  [STREAM-INIT-/10.128.X.Y:33306]
 StreamResultFuture.java:116 - [Stream
#e495cea0-efde-11e6-9ec0-8f99f25bfcf7, ID#0] Received streaming plan for
Bulk Load
2017-02-10 16:18:49,110 INFO  [STREAM-IN-/10.128.X.Y]
 StreamResultFuture.java:166 - [Stream
#e495cea0-efde-11e6-9ec0-8f99f25bfcf7 ID#0] Prepare completed. Receiving 3
files(3963 bytes), sending 0 files(0 bytes)


Any help would be greatly appreciated.

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

Re: [Marketing Mail] Re: [Marketing Mail] Re: sstableloader question

2016-10-12 Thread Osman YOZGATLIOGLU

Hello,

It's about 2500 sstables worth 25TB of data.
-t parameter doesn't change -t 1000 and -t 1
Most probably I face some limitation at target cluster.
I'm preparing to split sstables and run up to ten parallel sstableloader
sessions.

Regards,
Osman

On 11-10-2016 21:46, Rajath Subramanyam wrote:
How many sstables are you trying to load ? Running sstableloaders in parallel
will help. Did you try setting the "-t" parameter and see if you are getting
the expected throughput ?

- Rajath

Rajath Subramanyam

On Mon, Oct 10, 2016 at 2:02 PM, Osman YOZGATLIOGLU
<osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com>> wrote:
Hello,

Thank you Adam and Rajath.

I'll split input sstables and run parallel jobs for each.
I tested this approach and run 3 parallel sstableloader job without -t
parameter.
I raised stream_throughput_outbound_megabits_per_sec parameter from 200 to 600
Mbit/sec at all of target nodes.
But each job runs about 10MB/sec only and generates about 100Mbit'sec network
traffic.
At total this can be much more. Source and target servers has plenty of unused
cpu, io and network resource.
Do you have any idea how can I increase speed of sstableloader job?

Regards,
Osman

On 10-10-2016 22:05, Rajath Subramanyam wrote:
Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You can
restart the sstableloader job itself. Compaction will eventually take care of
the redundant rows.

- Rajath

Rajath Subramanyam

On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson
<a...@datascale.io<mailto:a...@datascale.io><mailto:a...@datascale.io<mailto:a...@datascale.io>>>
wrote:
It'll start over from the beginning.

On Sunday, October 9, 2016, Osman YOZGATLIOGLU
<osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com><mailto:osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com>>>
wrote:
Hello,

I have running a sstableloader job.
Unfortunately some of nodes restarted since beginnig streaming.
I see streaming stop for those nodes.
Can I restart those streaming somehow?
Or if I restart sstableloader job, will it start from beginning?

Regards,
Osman

This e-mail message, including any attachments, is for the sole use of the
person to whom it has been sent, and may contain information that is
confidential or legally protected. If you are not the intended recipient or
have received this message in error, you are not authorized to copy,
distribute, or otherwise use this message or its attachments. Please notify the
sender immediately by return e-mail and permanently delete this message and any
attachments. KRON makes no warranty that this e-mail is error or virus free.

Adam Hutson
Data Architect | DataScale
+1 (417)
224-5212<tel:%2B1%20%28417%29%20224-5212><tel:%2B1%20%28417%29%20224-5212>
a...@datascale.io<mailto:a...@datascale.io><mailto:a...@datascale.io<mailto:a...@datascale.io>>

Re: [Marketing Mail] Re: sstableloader question

2016-10-11 Thread Rajath Subramanyam

How many sstables are you trying to load ? Running sstableloaders in
parallel will help. Did you try setting the "-t" parameter and see if you
are getting the expected throughput ?

- Rajath


Rajath Subramanyam


On Mon, Oct 10, 2016 at 2:02 PM, Osman YOZGATLIOGLU <
osman.yozgatlio...@krontech.com> wrote:

> Hello,
>
> Thank you Adam and Rajath.
>
> I'll split input sstables and run parallel jobs for each.
> I tested this approach and run 3 parallel sstableloader job without -t
> parameter.
> I raised stream_throughput_outbound_megabits_per_sec parameter from 200
> to 600 Mbit/sec at all of target nodes.
> But each job runs about 10MB/sec only and generates about 100Mbit'sec
> network traffic.
> At total this can be much more. Source and target servers has plenty of
> unused cpu, io and network resource.
> Do you have any idea how can I increase speed of sstableloader job?
>
> Regards,
> Osman
>
> On 10-10-2016 22:05, Rajath Subramanyam wrote:
> Hi Osman,
>
> You cannot restart the streaming only to the failed nodes specifically.
> You can restart the sstableloader job itself. Compaction will eventually
> take care of the redundant rows.
>
> - Rajath
>
> 
> Rajath Subramanyam
>
>
> On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson <a...@datascale.io<mailto:adam
> @datascale.io>> wrote:
> It'll start over from the beginning.
>
>
> On Sunday, October 9, 2016, Osman YOZGATLIOGLU <
> osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com>>
> wrote:
> Hello,
>
> I have running a sstableloader job.
> Unfortunately some of nodes restarted since beginnig streaming.
> I see streaming stop for those nodes.
> Can I restart those streaming somehow?
> Or if I restart sstableloader job, will it start from beginning?
>
> Regards,
> Osman
>
>
> This e-mail message, including any attachments, is for the sole use of the
> person to whom it has been sent, and may contain information that is
> confidential or legally protected. If you are not the intended recipient or
> have received this message in error, you are not authorized to copy,
> distribute, or otherwise use this message or its attachments. Please notify
> the sender immediately by return e-mail and permanently delete this message
> and any attachments. KRON makes no warranty that this e-mail is error or
> virus free.
>
>
> --
>
> Adam Hutson
> Data Architect | DataScale
> +1 (417) 224-5212<tel:%2B1%20%28417%29%20224-5212>
> a...@datascale.io<mailto:a...@datascale.io>
>
>
>
>
> This e-mail message, including any attachments, is for the sole use of the
> person to whom it has been sent, and may contain information that is
> confidential or legally protected. If you are not the intended recipient or
> have received this message in error, you are not authorized to copy,
> distribute, or otherwise use this message or its attachments. Please notify
> the sender immediately by return e-mail and permanently delete this message
> and any attachments. KRON makes no warranty that this e-mail is error or
> virus free.
>

Re: [Marketing Mail] Re: sstableloader question

2016-10-10 Thread Osman YOZGATLIOGLU

Hello,

Thank you Adam and Rajath.

I'll split input sstables and run parallel jobs for each.
I tested this approach and run 3 parallel sstableloader job without -t 
parameter.
I raised stream_throughput_outbound_megabits_per_sec parameter from 200 to 600 
Mbit/sec at all of target nodes.
But each job runs about 10MB/sec only and generates about 100Mbit'sec network 
traffic.
At total this can be much more. Source and target servers has plenty of unused 
cpu, io and network resource.
Do you have any idea how can I increase speed of sstableloader job?

Regards,
Osman

On 10-10-2016 22:05, Rajath Subramanyam wrote:
Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You can 
restart the sstableloader job itself. Compaction will eventually take care of 
the redundant rows.

- Rajath


Rajath Subramanyam


On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson 
<a...@datascale.io<mailto:a...@datascale.io>> wrote:
It'll start over from the beginning.


On Sunday, October 9, 2016, Osman YOZGATLIOGLU 
<osman.yozgatlio...@krontech.com<mailto:osman.yozgatlio...@krontech.com>> wrote:
Hello,

I have running a sstableloader job.
Unfortunately some of nodes restarted since beginnig streaming.
I see streaming stop for those nodes.
Can I restart those streaming somehow?
Or if I restart sstableloader job, will it start from beginning?

Regards,
Osman


This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.


--

Adam Hutson
Data Architect | DataScale
+1 (417) 224-5212<tel:%2B1%20%28417%29%20224-5212>
a...@datascale.io<mailto:a...@datascale.io>




This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.

Re: sstableloader question

2016-10-10 Thread Rajath Subramanyam

Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You
can restart the sstableloader job itself. Compaction will eventually take
care of the redundant rows.

- Rajath


Rajath Subramanyam


On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson <a...@datascale.io> wrote:

> It'll start over from the beginning.
>
>
> On Sunday, October 9, 2016, Osman YOZGATLIOGLU <
> osman.yozgatlio...@krontech.com> wrote:
>
>> Hello,
>>
>> I have running a sstableloader job.
>> Unfortunately some of nodes restarted since beginnig streaming.
>> I see streaming stop for those nodes.
>> Can I restart those streaming somehow?
>> Or if I restart sstableloader job, will it start from beginning?
>>
>> Regards,
>> Osman
>>
>>
>> This e-mail message, including any attachments, is for the sole use of
>> the person to whom it has been sent, and may contain information that is
>> confidential or legally protected. If you are not the intended recipient or
>> have received this message in error, you are not authorized to copy,
>> distribute, or otherwise use this message or its attachments. Please notify
>> the sender immediately by return e-mail and permanently delete this message
>> and any attachments. KRON makes no warranty that this e-mail is error or
>> virus free.
>>
>
>
> --
>
> Adam Hutson
> Data Architect | DataScale
> +1 (417) 224-5212
> a...@datascale.io
>

Re: sstableloader question

2016-10-09 Thread Adam Hutson

It'll start over from the beginning.

On Sunday, October 9, 2016, Osman YOZGATLIOGLU <
osman.yozgatlio...@krontech.com> wrote:

> Hello,
>
> I have running a sstableloader job.
> Unfortunately some of nodes restarted since beginnig streaming.
> I see streaming stop for those nodes.
> Can I restart those streaming somehow?
> Or if I restart sstableloader job, will it start from beginning?
>
> Regards,
> Osman
>
>
> This e-mail message, including any attachments, is for the sole use of the
> person to whom it has been sent, and may contain information that is
> confidential or legally protected. If you are not the intended recipient or
> have received this message in error, you are not authorized to copy,
> distribute, or otherwise use this message or its attachments. Please notify
> the sender immediately by return e-mail and permanently delete this message
> and any attachments. KRON makes no warranty that this e-mail is error or
> virus free.
>


-- 

Adam Hutson
Data Architect | DataScale
+1 (417) 224-5212
a...@datascale.io

sstableloader question

2016-10-09 Thread Osman YOZGATLIOGLU

Hello,

I have running a sstableloader job.
Unfortunately some of nodes restarted since beginnig streaming.
I see streaming stop for those nodes.
Can I restart those streaming somehow?
Or if I restart sstableloader job, will it start from beginning?

Regards,
Osman


This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.

Re: sstableloader

2016-08-17 Thread Jean Tremblay

Thank you for your answer Kai.

On 17 Aug 2016, at 11:34 , Kai Wang <dep...@gmail.com<mailto:dep...@gmail.com>> 
wrote:

yes, you are correct.

On Tue, Aug 16, 2016 at 2:37 PM, Jean Tremblay 
<jean.tremb...@zen-innovations.com<mailto:jean.tremb...@zen-innovations.com>> 
wrote:
Hi,

I’m using Cassandra 3.7.

In the documentation for sstableloader I read the following:

<< Note: To get the best throughput from SSTable loading, you can use multiple 
instances of sstableloader to stream across multiple machines. No hard limit 
exists on the number of SSTables that sstableloader can run at the same time, 
so you can add additional loaders until you see no further improvement.>>

Does this mean that I can stream my sstables to my cluster from many instance 
of sstableloader running simultaneously on many client machines?

I ask because I would like to improve the transfer speed of my stables to my 
cluster.

Kind regards and thanks for your comments.

Jean

Re: sstableloader

2016-08-17 Thread Kai Wang

yes, you are correct.

On Tue, Aug 16, 2016 at 2:37 PM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

> Hi,
>
> I’m using Cassandra 3.7.
>
> In the documentation for sstableloader I read the following:
>
> << Note: To get the best throughput from SSTable loading, you can use
> multiple instances of sstableloader to stream across multiple machines. No
> hard limit exists on the number of SSTables that sstableloader can run at
> the same time, so you can add additional loaders until you see no further
> improvement.>>
>
> Does this mean that I can stream my sstables to my cluster from many
> instance of sstableloader running simultaneously on many client machines?
>
> I ask because I would like to improve the transfer speed of my stables to
> my cluster.
>
> Kind regards and thanks for your comments.
>
> Jean
>

Re: Restoring Incremental Backups without using sstableloader

2016-06-01 Thread Alain RODRIGUEZ

Hi,

Well you can do it through copy / past all the sstable as written in the
link you gave as long as your token ranges distribution did not change
since you took the snapshots and that you have a way to be sure what node
each sstable belongs. Make sure that snapshot taken to node X indeed go
back to node X.

If you do not have information on where the sstable comes from or if you
added / removed nodes, then using the sstableloader is probably a good
idea. If you really don't like sstableloader (not sure why), you can paste
all the sstables to all the nodes then nodetool refresh + nodetool cleanup.
But in most cases, all the data won't fit in one node, plus you might have
sstable names identical you'll have to handle.

Hope that helps,

C*heers,

---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-05-17 11:14 GMT+01:00 Ravi Teja A V <avt...@gmail.com>:

> Hi everyone
>
> I am currently working with Cassandra 3.5. I would like to know if it is
> possible to restore backups without using sstableloader. I have been
> referring to the following pages in the datastax documentation:
>
> https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsBackupSnapshotRestore.html
> Thank you.
>
> Yours sincerely
> RAVI TEJA A V
>

Does sstableloader still use gossip?

2016-05-31 Thread Matthias Niehoff

Hi,

in the docs it still says that the sstableloader still uses gossip
(
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html
http://docs.datastax.com/en/cassandra/3.x/cassandra/tools/toolsBulkloader.html
)

but this blog (
http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated)
posts says „sstableloader no longer participates in gossip membership to
get schema and ring information.“

While the blog post makes totally seeds I wonder why its still in the docs.
Is a correctly configured cassandra.yaml this necessary to use the
sstableloader or are the hosts specified with the -d option enough?

Thanks

-- 
Matthias Niehoff | IT-Consultant | Agile Software Factory  | Consulting
codecentric AG | Zeppelinstr 2 | 76185 Karlsruhe | Deutschland
tel: +49 (0) 721.9595-681 | fax: +49 (0) 721.9595-666 | mobil: +49 (0)
172.1702676
www.codecentric.de | blog.codecentric.de | www.meettheexperts.de |
www.more4fi.de

Sitz der Gesellschaft: Solingen | HRB 25917| Amtsgericht Wuppertal
Vorstand: Michael Hochgürtel . Mirko Novakovic . Rainer Vehns
Aufsichtsrat: Patric Fedlmeier (Vorsitzender) . Klaus Jäger . Jürgen Schütz

Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche
und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige
Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
bitte sofort den Absender und löschen Sie diese E-Mail und evtl.
beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen
evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist
nicht gestattet

Re: sstableloader: Stream failed

2016-05-24 Thread Ralf Steppacher

Thanks for the hint! Indeed I could not telnet to the host. It was the 
listen_address that was not properly configured.

Thanks again!
Ralf


> On 23.05.2016, at 21:01, Paulo Motta <pauloricard...@gmail.com> wrote:
> 
> Can you telnet 10.211.55.8 7000? This is the port used for streaming 
> communication with the destination node.
> 
> If not you should check what is the configured storage_port in the 
> destination node and set that in the cassandra.yaml of the source node so 
> it's picked up by sstableloader.
>

Re: sstableloader: Stream failed

2016-05-23 Thread Paulo Motta

Can you telnet 10.211.55.8 7000? This is the port used for streaming
communication with the destination node.

If not you should check what is the configured storage_port in the
destination node and set that in the cassandra.yaml of the source node so
it's picked up by sstableloader.

2016-05-23 10:48 GMT-03:00 Ralf Steppacher <ralf.viva...@gmail.com>:

> Hello,
>
> I am trying to load the SSTables (from a Titan graph keyspace) of a
> one-node-cluster (C* v2.2.6) into another node, but I cannot figure out how
> to properly use the sstableloader. The target keyspace and table exist in
> the target node. If they do not exist I get a proper error message telling
> me so.
> Providing a cassandra.yaml or not makes no difference.
> The listen_address and rpc_address values in the cassandra.yaml, if
> provided, do not seem to matter (at least the error is always the same).
> Running sstableloader on the C* node itself or another host makes no
> difference.
> Truncating all tables before attempting to load the date makes no
> difference.
>
> The node is up and running:
> INFO  13:41:18 Starting listening for CQL clients on /10.211.55.8:9042...
> INFO  13:41:18 Binding thrift service to /10.211.55.8:9160
> INFO  13:41:18 Listening for thrift clients...
>
>
> The error I am getting is this:
>
> $ ./sstableloader -d 10.211.55.8 -f ../conf/cassandra.yaml -v ~/Downloads/
>
> ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/
> objc[18941]: Class JavaLaunchHelper is implemented in both
> /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/bin/java
> and
> /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre/lib/libinstrument.dylib.
> One of the two will be used. Which one is undefined.
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of
> /Users/rsteppac/Downloads/ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/la-1-big-Data.db
> to [/10.211.55.8]
> ERROR 12:57:24 [Stream #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Streaming
> error occurred
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_77]
> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_77]
> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_77]
> at
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
> ~[na:1.8.0_77]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
> ~[na:1.8.0_77]
> at
> org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:248)
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:83)
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:235)
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212)
> [apache-cassandra-2.2.6.jar:2.2.6]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_77]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> progress: total: 100% 0  MB/s(avg: 0 MB/s)WARN  12:57:24 [Stream
> #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Stream failed
> Streaming to the following hosts failed:
> [/10.211.55.8]
> java.util.concurrent.ExecutionException:
> org.apache.cassandra.streaming.StreamException: Stream failed
> at
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
> at
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
> at
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:115)
> Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
> at
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
> at
> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
> at
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
> at
> com.

sstableloader: Stream failed

2016-05-23 Thread Ralf Steppacher

Hello,

I am trying to load the SSTables (from a Titan graph keyspace) of a 
one-node-cluster (C* v2.2.6) into another node, but I cannot figure out how to 
properly use the sstableloader. The target keyspace and table exist in the 
target node. If they do not exist I get a proper error message telling me so. 
Providing a cassandra.yaml or not makes no difference. 
The listen_address and rpc_address values in the cassandra.yaml, if provided, 
do not seem to matter (at least the error is always the same).
Running sstableloader on the C* node itself or another host makes no difference.
Truncating all tables before attempting to load the date makes no difference.

The node is up and running:
INFO  13:41:18 Starting listening for CQL clients on /10.211.55.8:9042...
INFO  13:41:18 Binding thrift service to /10.211.55.8:9160
INFO  13:41:18 Listening for thrift clients...


The error I am getting is this:

$ ./sstableloader -d 10.211.55.8 -f ../conf/cassandra.yaml -v ~/Downloads/
ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/
objc[18941]: Class JavaLaunchHelper is implemented in both 
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/bin/java and 
/Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre/lib/libinstrument.dylib.
 One of the two will be used. Which one is undefined.
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of 
/Users/rsteppac/Downloads/ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/la-1-big-Data.db
 to [/10.211.55.8]
ERROR 12:57:24 [Stream #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Streaming error 
occurred
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_77]
at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_77]
at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_77]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) 
~[na:1.8.0_77]
at java.nio.channels.SocketChannel.open(SocketChannel.java:189) 
~[na:1.8.0_77]
at 
org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
 ~[apache-cassandra-2.2.6.jar:2.2.6]
at 
org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:248)
 ~[apache-cassandra-2.2.6.jar:2.2.6]
at 
org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:83)
 ~[apache-cassandra-2.2.6.jar:2.2.6]
at 
org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:235) 
~[apache-cassandra-2.2.6.jar:2.2.6]
at 
org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212)
 [apache-cassandra-2.2.6.jar:2.2.6]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_77]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
progress: total: 100% 0  MB/s(avg: 0 MB/s)WARN  12:57:24 [Stream 
#e4b9cbc0-20e5-11e6-a00f-4b867a050904] Stream failed
Streaming to the following hosts failed:
[/10.211.55.8]
java.util.concurrent.ExecutionException: 
org.apache.cassandra.streaming.StreamException: Stream failed
at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:115)
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at 
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
at 
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
at 
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
at 
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
at 
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
at 
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
at 
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:434)
at 
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:529)
at 
org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:241

Restoring Incremental Backups without using sstableloader

2016-05-17 Thread Ravi Teja A V

Hi everyone

I am currently working with Cassandra 3.5. I would like to know if it is
possible to restore backups without using sstableloader. I have been
referring to the following pages in the datastax documentation:
https://docs.datastax.com/en/cassandra/3.x/cassandra/operations/opsBackupSnapshotRestore.html
Thank you.

Yours sincerely
RAVI TEJA A V

Re: sstableloader throughput

2016-01-11 Thread Noorul Islam Kamal Malmiyoda

On Mon, Jan 11, 2016 at 10:25 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote:
>
> Make sure streaming throughput isn’t throttled on the destination cluster.
>


How do I do that? Is stream_throughput_outbound_megabits_per_sec the
attribute in cassandra.yaml.

I think we can set that on the fly using nodetool setstreamthroughput

I ran

nodetool setstreamthroughput 0

on target machine. But that doesn't improve the average througput.

Thanks and Regards
Noorul

> Stream from more machines (divide sstables between a bunch of machines, run 
> in parallel).
>
>
>
>
>
>
>
> On 1/11/16, 5:21 AM, "Noorul Islam K M" <noo...@noorul.com> wrote:
>
>>
>>I have a need to stream data to new cluster using sstableloader. I
>>spawned a machine with 32 cores assuming that sstableloader scaled with
>>respect to cores. But it doesn't look like so.
>>
>>I am getting an average throughput of 18 MB/s which seems to be pretty
>>low (I might be wrong).
>>
>>Is there any way to increase the throughput. OpsCenter data on target
>>cluster shows very less write requests / second.
>>
>>Thanks and Regards
>>Noorul

sstableloader throughput

2016-01-11 Thread Noorul Islam K M


I have a need to stream data to new cluster using sstableloader. I
spawned a machine with 32 cores assuming that sstableloader scaled with
respect to cores. But it doesn't look like so.

I am getting an average throughput of 18 MB/s which seems to be pretty
low (I might be wrong).

Is there any way to increase the throughput. OpsCenter data on target
cluster shows very less write requests / second.

Thanks and Regards
Noorul

Re: sstableloader throughput

2016-01-11 Thread Jeff Jirsa

Make sure streaming throughput isn’t throttled on the destination cluster. 

Stream from more machines (divide sstables between a bunch of machines, run in 
parallel).

On 1/11/16, 5:21 AM, "Noorul Islam K M" <noo...@noorul.com> wrote:

>
>I have a need to stream data to new cluster using sstableloader. I
>spawned a machine with 32 cores assuming that sstableloader scaled with
>respect to cores. But it doesn't look like so.
>
>I am getting an average throughput of 18 MB/s which seems to be pretty
>low (I might be wrong).
>
>Is there any way to increase the throughput. OpsCenter data on target
>cluster shows very less write requests / second.
>
>Thanks and Regards
>Noorul

smime.p7s
Description: S/MIME cryptographic signature

Re: why I got error "Could not retrieve endpoint rangs" when I run sstableloader?

2015-12-28 Thread Yuki Morishita

You only need patch for sstableloader.
You don't have to upgrade your cassandra servers at all.

So,

1. fetch the latest cassandra-2.1 source
$ git clone https://git-wip-us.apache.org/repos/asf/cassandra.git
$ cd cassandra
$ git checkout origin/cassandra-2.1
2. build it
$ ant
3. use sstableloader you just built
$ bin/sstableloader 



On Mon, Dec 28, 2015 at 6:03 PM, 土卜皿 <pengcz.n...@gmail.com> wrote:
> hi, Yuki
>Thank you very much!
> The issue's description almost fits to my case!
> 1. My Cassandra version is 2.1.11
>  2.  my table has several colomn with collection type
>  3.  Before failed this time, I can use sstableloader to load the data
> into this table,  but
>  I got this error after I drop one column with collection type and
> insert a column with int type
> Do you think I will resolve my question if I  update the version into
> 2.1.13?
>
> And, my table already had 560 millions of records. So, for resolving this,
> Whether I only need to update the new version C*.jar
> and restart  cassandra?
>
> Dillon
>
> 2015-12-29 7:36 GMT+08:00 Yuki Morishita <mor.y...@gmail.com>:
>>
>> This is known issue.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-10700
>>
>> It is fixed in not-yet-released version 2.1.13.
>> So, you need to build from the latest cassandra-2.1 branch to try.
>>
>>
>> On Mon, Dec 28, 2015 at 5:28 PM, 土卜皿 <pengcz.n...@gmail.com> wrote:
>> > hi, all
>> >    I used the sstableloader many times successfully, but I got the
>> > following
>> > error:
>> >
>> > [root@localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user
>> > -pw
>> > password -v -d 172.21.0.131 ./currentdata/keyspace/table
>> >
>> > Could not retrieve endpoint ranges:
>> > java.lang.IllegalArgumentException
>> > java.lang.RuntimeException: Could not retrieve endpoint ranges:
>> > at
>> >
>> > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338)
>> > at
>> >
>> > org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
>> > at
>> > org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
>> > Caused by: java.lang.IllegalArgumentException
>> > at java.nio.Buffer.limit(Buffer.java:267)
>> > at
>> >
>> > org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543)
>> > at
>> >
>> > org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124)
>> > at
>> >
>> > org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101)
>> > at
>> >
>> > org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30)
>> > at
>> >
>> > org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50)
>> > at
>> >
>> > org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68)
>> > at
>> >
>> > org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287)
>> > at
>> >
>> > org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833)
>> > at
>> >
>> > org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126)
>> > at
>> >
>> > org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330)
>> > ... 2 more
>> >
>> > I don't know whether this error is relative to one of cluster nodes'
>> > linux
>> > crash?
>> >
>> > Any advice will be appreciated!
>> >
>> > Dillon Peng
>>
>>
>>
>> --
>> Yuki Morishita
>>  t:yukim (http://twitter.com/yukim)
>
>



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)

why I got error "Could not retrieve endpoint rangs" when I run sstableloader?

2015-12-28 Thread 土卜皿

hi, all
   I used the sstableloader many times successfully, but I got the
following error:

[root@localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user -pw
password -v -d 172.21.0.131 ./currentdata/keyspace/table

Could not retrieve endpoint ranges:
java.lang.IllegalArgumentException
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
Caused by: java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543)
at
org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124)
at
org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101)
at
org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30)
at
org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50)
at
org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68)
at
org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287)
at
org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833)
at
org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126)
at
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330)
... 2 more

I don't know whether this error is relative to one of cluster nodes' linux
crash?

Any advice will be appreciated!

Dillon Peng

Re: why I got error "Could not retrieve endpoint rangs" when I run sstableloader?

2015-12-28 Thread Yuki Morishita

This is known issue.

https://issues.apache.org/jira/browse/CASSANDRA-10700

It is fixed in not-yet-released version 2.1.13.
So, you need to build from the latest cassandra-2.1 branch to try.


On Mon, Dec 28, 2015 at 5:28 PM, 土卜皿 <pengcz.n...@gmail.com> wrote:
> hi, all
>    I used the sstableloader many times successfully, but I got the following
> error:
>
> [root@localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user -pw
> password -v -d 172.21.0.131 ./currentdata/keyspace/table
>
> Could not retrieve endpoint ranges:
> java.lang.IllegalArgumentException
> java.lang.RuntimeException: Could not retrieve endpoint ranges:
> at
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338)
> at
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
> Caused by: java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Buffer.java:267)
> at
> org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543)
> at
> org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124)
> at
> org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101)
> at
> org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30)
> at
> org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50)
> at
> org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68)
> at
> org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287)
> at
> org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833)
> at
> org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126)
> at
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330)
> ... 2 more
>
> I don't know whether this error is relative to one of cluster nodes' linux
> crash?
>
> Any advice will be appreciated!
>
> Dillon Peng



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)

Re: why I got error "Could not retrieve endpoint rangs" when I run sstableloader?

2015-12-28 Thread 土卜皿

hi, Yuki
   Thank you very much!
The issue's description almost fits to my case!
1. My Cassandra version is 2.1.11
 2.  my table has several colomn with collection type
 3.  Before failed this time, I can use sstableloader to load the data
into this table,  but
 I got this error after I drop one column with collection type and
insert a column with int type
Do you think I will resolve my question if I  update the version into
2.1.13?

And, my table already had 560 millions of records. So, for resolving this,
Whether I only need to update the new version C*.jar
and restart  cassandra?

Dillon

2015-12-29 7:36 GMT+08:00 Yuki Morishita <mor.y...@gmail.com>:

> This is known issue.
>
> https://issues.apache.org/jira/browse/CASSANDRA-10700
>
> It is fixed in not-yet-released version 2.1.13.
> So, you need to build from the latest cassandra-2.1 branch to try.
>
>
> On Mon, Dec 28, 2015 at 5:28 PM, 土卜皿 <pengcz.n...@gmail.com> wrote:
> > hi, all
> >I used the sstableloader many times successfully, but I got the
> following
> > error:
> >
> > [root@localhost pengcz]# /usr/local/cassandra/bin/sstableloader -u user
> -pw
> > password -v -d 172.21.0.131 ./currentdata/keyspace/table
> >
> > Could not retrieve endpoint ranges:
> > java.lang.IllegalArgumentException
> > java.lang.RuntimeException: Could not retrieve endpoint ranges:
> > at
> >
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:338)
> > at
> >
> org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
> > at
> org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:106)
> > Caused by: java.lang.IllegalArgumentException
> > at java.nio.Buffer.limit(Buffer.java:267)
> > at
> >
> org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:543)
> > at
> >
> org.apache.cassandra.serializers.CollectionSerializer.readValue(CollectionSerializer.java:124)
> > at
> >
> org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:101)
> > at
> >
> org.apache.cassandra.serializers.MapSerializer.deserializeForNativeProtocol(MapSerializer.java:30)
> > at
> >
> org.apache.cassandra.serializers.CollectionSerializer.deserialize(CollectionSerializer.java:50)
> > at
> >
> org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:68)
> > at
> >
> org.apache.cassandra.cql3.UntypedResultSet$Row.getMap(UntypedResultSet.java:287)
> > at
> >
> org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1833)
> > at
> >
> org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1126)
> > at
> >
> org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:330)
> > ... 2 more
> >
> > I don't know whether this error is relative to one of cluster nodes'
> linux
> > crash?
> >
> > Any advice will be appreciated!
> >
> > Dillon Peng
>
>
>
> --
> Yuki Morishita
>  t:yukim (http://twitter.com/yukim)
>

Re: Running sstableloader from every node when migrating?

2015-12-01 Thread George Sigletos

Thank you Robert and Anuja,

It does not seem that sstable2json is the right tool to go: there is no
documentation beyond Cassandra 1.2, it requires a specific sstable to be
given, which means a lot of manual work.

The documentation also mentions it is good for testing/debugging but I
would need to migrate near 1 TB of data from a 6-node cluster to a 3-node
one. Neither copying sstables/nodetool refresh seems a great option as
well. Unless I am missing something.

Using sstableloader seems a more logical option. Still a bottleneck if you
need to do it for every node in your source cluster. What if you had a
100-node cluster?

Thinking of just running a simple script, instead, that selects data from
the source cluster and inserts them to the target one.

Kind regards,
George

On Tue, Dec 1, 2015 at 7:54 AM, anuja jain <anujaja...@gmail.com> wrote:

> Hello George,
> You can use sstable2json to create the json of your keyspace and then load
> this json to your keyspace in new cluster using json2sstable utility.
>
> On Tue, Dec 1, 2015 at 3:06 AM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos <sigle...@textkernel.nl>
>> wrote:
>>
>>> We would like to migrate one keyspace from a 6-node cluster to a 3-node
>>> one.
>>>
>>
>> http://www.pythian.com/blog/bulk-loading-options-for-cassandra/
>>
>> =Rob
>>
>>
>
>

Re: Running sstableloader from every node when migrating?

2015-11-30 Thread Robert Coli

On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos 
wrote:

> We would like to migrate one keyspace from a 6-node cluster to a 3-node
> one.
>

http://www.pythian.com/blog/bulk-loading-options-for-cassandra/

=Rob

Re: Running sstableloader from every node when migrating?

2015-11-30 Thread anuja jain

Hello George,
You can use sstable2json to create the json of your keyspace and then load
this json to your keyspace in new cluster using json2sstable utility.

On Tue, Dec 1, 2015 at 3:06 AM, Robert Coli  wrote:

> On Thu, Nov 19, 2015 at 7:01 AM, George Sigletos 
> wrote:
>
>> We would like to migrate one keyspace from a 6-node cluster to a 3-node
>> one.
>>
>
> http://www.pythian.com/blog/bulk-loading-options-for-cassandra/
>
> =Rob
>
>

Running sstableloader from every node when migrating?

2015-11-19 Thread George Sigletos

Hello,

We would like to migrate one keyspace from a 6-node cluster to a 3-node one.

Since an individual node does not contain all data, this means that we
should run the sstableloader 6 times, one for each node of our cluster.

To be precise, do "nodetool flush " then run sstableloader -d <3
target nodes> 

Would that be the correct approach?

Thank you in advance,
George

Re: Data.db too large and after sstableloader still large

2015-11-12 Thread Robert Coli

On Thu, Nov 12, 2015 at 6:44 AM, qihuang.zheng <qihuang.zh...@fraudmetrix.cn
> wrote:

>  question is : why sstableloader can’t balance data file size?
>

Because it streams ranges from the source SStable to a distributed set of
ranges, especially if you are using vnodes.

It is a general property of Cassandra's streaming that it results in
SStables that are likely different in size than those that result from
flush.

Why are you preoccupied with the filesizes of files sized in the hundreds
of megabytes? Why do you care about this amount of variance in file sized?

=Rob

Re: Data.db too large and after sstableloader still large

2015-11-12 Thread qihuang.zheng

Tks,Rob. We use spark-cassandra-connector to read data from table, then do 
repartition action.
If some nodes with large file bring out running this tasktoo slow, maybe 
serveral hours which is unacceptable.
But those nodes with small file running finished quickly.
So I think if sstableloader can split to small size, and can balance to all 
nodes, thus our spark job can running quickly.




Tks,qihuang.zheng


原始邮件
发件人:Robert colirc...@eventbrite.com
收件人:user@cassandra.apache.orgu...@cassandra.apache.org
发送时间:2015年11月13日(周五) 04:04
主题:Re: Data.db too large and after sstableloader still large


On Thu, Nov 12, 2015 at 6:44 AM, qihuang.zheng qihuang.zh...@fraudmetrix.cn 
wrote:

question is : why sstableloader can’t balance data file size?


Because it streams ranges from the source SStable to a distributed set of 
ranges, especially if you are using vnodes.


It is a general property of Cassandra's streaming that it results in SStables 
that are likely different in size than those that result from flush.


Why are you preoccupied with the filesizes of files sized in the hundreds of 
megabytes? Why do you care about this amount of variance in file sized?


=Rob

Data.db too large and after sstableloader still large

2015-11-12 Thread qihuang.zheng

We do snapshot, and found some Data.db too large:
[qihuang.zheng@spark047219 5]$ find . -type f -size +800M -print0 | xargs -0 ls 
-lh
-rw-r--r--. 2 qihuang.zheng users 1.5G 10月 28 14:49 
./forseti/velocity/forseti-velocity-jb-103631-Data.db


And sstableloader to new cluster, one node has this large file:
[qihuang.zheng@spark047243 velocity]$ ll -rth | grep Data
-rw-r--r--. 1 admin admin 46M 11月 12 18:22 forseti-velocity-jb-21-Data.db
-rw-r--r--. 1 admin admin 156M 11月 12 18:22 forseti-velocity-jb-22-Data.db
-rw-r--r--. 1 admin admin 2.6M 11月 12 18:22 forseti-velocity-jb-23-Data.db
-rw-r--r--. 1 admin admin 162M 11月 12 18:22 forseti-velocity-jb-24-Data.db
-rw-r--r--. 1 admin admin 1.5G 11月 12 18:22 forseti-velocity-jb-25-Data.db  
-BigFile Still here


Seems sstableloader don’t split file very well. Why sstableloader can’t split 
to small filter to new cluster?
I tried usesstablesplit at snapshot before sstableloader, but this progress is 
too slow.



Tks,qihuang.zheng

回复：Data.db too large and after sstableloader still large

2015-11-12 Thread qihuang.zheng

Original snapshot files:
[qihuang.zheng@spark047219 226_1105]$ ll 2/forseti/velocity/ -h | grep Data 
-rw-r--r--. 1 qihuang.zheng users 158M 10月 28 15:03 
forseti-velocity-jb-102486-Data.db -rw-r--r--. 1 qihuang.zheng users 161M 10月 
28 16:28 forseti-velocity-jb-103911-Data.db -rw-r--r--. 1 qihuang.zheng users 
161M 10月 28 14:23 forseti-velocity-jb-103920-Data.db -rw-r--r--. 1 
qihuang.zheng users 370M 10月 28 14:10 forseti-velocity-jb-105829-Data.db ⬅️ A 
Big File ① -rw-r--r--. 1 qihuang.zheng users 161M 10月 28 14:07 
forseti-velocity-jb-107113-Data.db -rw-r--r--. 1 qihuang.zheng users 160M 10月 
28 15:53 forseti-velocity-jb-73122-Data.db -rw-r--r--. 1 qihuang.zheng users 
161M 10月 28 14:46 forseti-velocity-jb-85829-Data.db -rw-r--r--. 1 qihuang.zheng 
users 161M 10月 28 15:29 forseti-velocity-jb-87661-Data.db -rw-r--r--. 1 
qihuang.zheng users 161M 10月 28 15:05 forseti-velocity-jb-93091-Data.db
sstable to new cluster
[qihuang.zheng@cass047202 ~]$ ./psshA.sh ip_spark.txt 'ls 
/home/admin/cassandra/data/forseti/velocity -hl |grep Data' Warning: do not 
enter your password if anyone else has superuser privileges or access to your 
account. Password: [1] 22:29:43 [SUCCESS] 192.168.47.208 -rw-r--r--. 1 admin 
admin 365K 11月 12 22:10 forseti-velocity-jb-20-Data.db -rw-r--r--. 1 admin 
admin 370M 11月 12 22:10 forseti-velocity-jb-21-Data.db ⬅️ File Still Large! and 
same size as ① -rw-r--r--. 1 admin admin 11M 11月 12 22:10 
forseti-velocity-jb-22-Data.db [2] 22:29:43 [SUCCESS] 192.168.47.212 
-rw-r--r--. 1 admin admin 146M 11月 12 22:09 forseti-velocity-jb-22-Data.db 
-rw-r--r--. 1 admin admin 3.7M 11月 12 22:09 forseti-velocity-jb-23-Data.db [3] 
22:29:43 [SUCCESS] 192.168.47.215 -rw-r--r--. 1 admin admin 916K 11月 12 22:09 
forseti-velocity-jb-14-Data.db [4] 22:29:43 [SUCCESS] 192.168.47.242 ⬅️ Almost 
Go To This Node! -rw-r--r--. 1 admin admin 106M 11月 12 22:10 
forseti-velocity-jb-24-Data.db -rw-r--r--. 1 admin admin 160M 11月 12 22:10 
forseti-velocity-jb-25-Data.db -rw-r--r--. 1 admin admin 158M 11月 12 22:10 
forseti-velocity-jb-26-Data.db -rw-r--r--. 1 admin admin 160M 11月 12 22:10 
forseti-velocity-jb-27-Data.db [5] 22:29:43 [FAILURE] 192.168.47.223 Exited 
with error code 1 ⬅️ This Node has None Files! [6] 22:29:43 [SUCCESS] 
192.168.47.244 -rw-r--r--. 1 admin admin 111M 11月 12 22:09 
forseti-velocity-jb-18-Data.db [7] 22:29:43 [SUCCESS] 192.168.47.245 
-rw-r--r--. 1 admin admin 50M 11月 12 22:09 forseti-velocity-jb-22-Data.db 
-rw-r--r--. 1 admin admin 170K 11月 12 22:09 forseti-velocity-jb-23-Data.db [8] 
22:29:43 [SUCCESS] 192.168.47.241 -rw-r--r--. 1 admin admin 7.5M 11月 12 22:09 
forseti-velocity-jb-30-Data.db [9] 22:29:43 [FAILURE] 192.168.47.218 Exited 
with error code 1 ⬅️ No Files [10] 22:29:43 [SUCCESS] 192.168.47.243 
-rw-r--r--. 1 admin admin 15M 11月 12 22:09 forseti-velocity-jb-29-Data.db [11] 
22:29:43 [SUCCESS] 192.168.47.219 -rw-r--r--. 1 admin admin 160M 11月 12 22:09 
forseti-velocity-jb-23-Data.db [12] 22:29:43 [SUCCESS] 192.168.47.217 
-rw-r--r--. 1 admin admin 30M 11月 12 22:09 forseti-velocity-jb-22-Data.db [13] 
22:29:44 [SUCCESS] 192.168.47.216 -rw-r--r--. 1 admin admin 3.5M 11月 12 22:09 
forseti-velocity-jb-20-Data.db -rw-r--r--. 1 admin admin 161M 11月 12 22:09 
forseti-velocity-jb-21-Data.db
We use spark-case-connecot to read table and repartition. Spark repartition job 
below indicate:
If nodes has none data.db like first two nodes, InputSize is 0.0B,and nodes 
with large files like the last one running too long!
My question is : why sstableloader can’t balance data file size?




Tks,qihuang.zheng


原始邮件
发件人:qihuang.zhengqihuang.zh...@fraudmetrix.cn
收件人:useru...@cassandra.apache.org
发送时间:2015年11月12日(周四) 21:20
主题:Data.db too large and after sstableloader still large


We do snapshot, and found some Data.db too large:
[qihuang.zheng@spark047219 5]$ find . -type f -size +800M -print0 | xargs -0 ls 
-lh
-rw-r--r--. 2 qihuang.zheng users 1.5G 10月 28 14:49 
./forseti/velocity/forseti-velocity-jb-103631-Data.db


And sstableloader to new cluster, one node has this large file:
[qihuang.zheng@spark047243 velocity]$ ll -rth | grep Data
-rw-r--r--. 1 admin admin 46M 11月 12 18:22 forseti-velocity-jb-21-Data.db
-rw-r--r--. 1 admin admin 156M 11月 12 18:22 forseti-velocity-jb-22-Data.db
-rw-r--r--. 1 admin admin 2.6M 11月 12 18:22 forseti-velocity-jb-23-Data.db
-rw-r--r--. 1 admin admin 162M 11月 12 18:22 forseti-velocity-jb-24-Data.db
-rw-r--r--. 1 admin admin 1.5G 11月 12 18:22 forseti-velocity-jb-25-Data.db  
-BigFile Still here


Seems sstableloader don’t split file very well. Why sstableloader can’t split 
to small filter to new cluster?
I tried usesstablesplit at snapshot before sstableloader, but this progress is 
too slow.



Tks,qihuang.zheng

1 2 3 >

1 - 100 of 244 matches

Mail list logo