subject:"Re\: Sstableloader"

Re: SSTableloader questions

2020-11-12 Thread Erick Ramirez

>
> Can the sstableloader job run from outside a Cassandra node? or it has to
> be run from inside Cassandra node.
>

Yes, I'm a fan of running sstableloader on a server that is not one of the
nodes in the cluster. You can maximise the throughput by running multiple
instances of sstableloader loading SSTables from separate
sources/filesystems.

My suspicion is that the failed connection to the nodes is due to the SSL
options so check that you've specified the truststore/keystore correctly.
Cheers!

>

Re: SSTableloader questions

2020-11-12 Thread Jai Bheemsen Rao Dhanwada

Hello Erick,

I have one more question.

Can the sstableloader job run from outside a Cassandra node? or it has to
be run from inside Cassandra node.

When I tried it from the cassandra node it worked but when I try to run it
from outside the cassandra cluster(a standalone machine which doesn't have
any Cassandra process running) using the below command it fails with
streaming error.

*Command:*

> $ /root/apache-cassandra-3.11.6/bin/sstableloader -d ip1,ip2,ip3
> keyspace1/table1 --truststore truststore.p12 --truststore-password
> cassandra --keystore-password cassandra --keystore keystore.p12 -v -u user
> -pw password --ssl-storage-port 7001 -prtcl TLS


*Errors:*

> ERROR 21:48:22,078 [Stream #be7a0de0-2530-11eb-bc56-c7c5c59d560b]
> Streaming error occurred on session with peer 10.66.129.194
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_272]
> at sun.nio.ch.Net.connect(Net.java:482) ~[na:1.8.0_272]
> at sun.nio.ch.Net.connect(Net.java:474) ~[na:1.8.0_272]
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:647)
> ~[na:1.8.0_272]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
> ~[na:1.8.0_272]
> at
> org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:283)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:86)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:270)
> ~[apache-cassandra-3.11.6.jar:3.11.6]
> at
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:269)
> [apache-cassandra-3.11.6.jar:3.11.6]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [na:1.8.0_272]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [na:1.8.0_272]
> at
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
> [apache-cassandra-3.11.6.jar:3.11.6]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]
> progress: total: 100% 0.000KiB/s (avg: 0.000KiB/s)


On Mon, Nov 9, 2020 at 3:08 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Thanks Erick, I will go through the posts and get back if I have any
> questions.
>
> On Mon, Nov 9, 2020 at 1:58 PM Erick Ramirez 
> wrote:
>
>> A few months ago, I was asked a similar question so I wrote instructions
>> for this. It depends on whether the clusters are identical or not. The
>> posts define what "identical" means.
>>
>> If the source and target cluster are identical in configuration, follow
>> the procedure here -- https://community.datastax.com/questions/4534/.
>>
>> If the source and target cluster have different configurations, follow
>> the procedure here -- https://community.datastax.com/questions/4477/.
>> Cheers!
>>
>

Re: SSTableloader questions

2020-11-09 Thread Jai Bheemsen Rao Dhanwada

Thanks Erick, I will go through the posts and get back if I have any
questions.

On Mon, Nov 9, 2020 at 1:58 PM Erick Ramirez 
wrote:

> A few months ago, I was asked a similar question so I wrote instructions
> for this. It depends on whether the clusters are identical or not. The
> posts define what "identical" means.
>
> If the source and target cluster are identical in configuration, follow
> the procedure here -- https://community.datastax.com/questions/4534/.
>
> If the source and target cluster have different configurations, follow the
> procedure here -- https://community.datastax.com/questions/4477/. Cheers!
>

Re: SSTableloader questions

2020-11-09 Thread Erick Ramirez

A few months ago, I was asked a similar question so I wrote instructions
for this. It depends on whether the clusters are identical or not. The
posts define what "identical" means.

If the source and target cluster are identical in configuration, follow the
procedure here -- https://community.datastax.com/questions/4534/.

If the source and target cluster have different configurations, follow the
procedure here -- https://community.datastax.com/questions/4477/. Cheers!

Re: sstableloader - warning vs. failure?

2020-02-07 Thread James A. Robinson

Ok, thanks very much the answer!

On Fri, Feb 7, 2020 at 9:00 PM Erick Ramirez  wrote:

> INFO  [pool-1-thread-4] 2020-02-08 01:35:37,946 NoSpamLogger.java:91 -
>> Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
>>
>
> The message gets logged when SSTables are being cached and the cache fills
> up faster than objects are evicted from it. Note that the message is logged
> at INFO level (instead of WARN or ERROR) because there is no detrimental
> effect but there will be a performance hit in the form of read latency.
> When space becomes available, it will just continue on to cache the next
> 64k chunk of the sstable.
>
> FWIW The default cache size (file_cache_size_in_mb in cassandra.yaml) is
> 512 MB (max memory of 536870912 in the log entry). Cheers!
>

Re: sstableloader - warning vs. failure?

2020-02-07 Thread Erick Ramirez

>
> INFO  [pool-1-thread-4] 2020-02-08 01:35:37,946 NoSpamLogger.java:91 -
> Maximum memory usage reached (536870912), cannot allocate chunk of 1048576
>

The message gets logged when SSTables are being cached and the cache fills
up faster than objects are evicted from it. Note that the message is logged
at INFO level (instead of WARN or ERROR) because there is no detrimental
effect but there will be a performance hit in the form of read latency.
When space becomes available, it will just continue on to cache the next
64k chunk of the sstable.

FWIW The default cache size (file_cache_size_in_mb in cassandra.yaml) is
512 MB (max memory of 536870912 in the log entry). Cheers!

Re: sstableloader: How much does it actually need?

2020-02-07 Thread Reid Pinchback

Just mulling this based on some code and log digging I was doing while trying 
to have Reaper stay on top of our cluster.

I think maybe the caveat here relates to eventual consistency.  C* doesn’t do 
state changes as distributed transactions.  The assumption here is that RF=3 is 
implying that at any given instant in real time, either the data is visible 
nowhere, or it is visible in 3 places.  That’s a conceptual simplification but 
not a real time invariant when you don’t have a transactional horizon to 
perfectly determine visibility of data.

When you have C* usage antipatterns like a client that is determined to read 
back data that it just wrote, as though there was a session context that 
somehow provided repeatable read guarantees, under the covers in the logs you 
can see C* fighting to do on-the-fly repairs to push through the requested 
level of consistency before responding to the query.  Which means, for some 
period of time, that achieving consistency was still work in flight.

I’ve also read about some boundary screw cases like drift in time resolution 
between servers creating the opportunity for stale data, and repairs I think 
would fix that. I haven’t tested the scenario though, so I’m not sure how real 
the situation is.

Bottom line though, minus repairs, I think having all the nodes is getting you 
all your chances to repair the problems.  And if the data is mutating as you 
are grabbing it, the entire frontier of changes is ‘minus repairs’.  Since 
tokens are distributed somewhat randomly, you don’t know where you need to make 
up the differences after.

That’s about as far as my navel gazing goes on that.

From: manish khandelwal 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, February 7, 2020 at 12:22 AM
To: "user@cassandra.apache.org" 
Subject: Re: sstableloader: How much does it actually need?

Message from External Sender
Yes you will have all the data in two nodes provided there is no mutation drop 
at node level or data is repaired

For example if you data A,B,C and D. with RF=3 and 4 nodes (node1, node2, node3 
and node4)

Data A is in node1, node2 and node3
Data B is in node2, node3, and node4
Data C is in node3, node4 and node1
Data D is in node4, node1 and node2

With this configuration, any two nodes combined will give all the data.


Regards
Manish

On Fri, Feb 7, 2020 at 12:53 AM Voytek Jarnot 
mailto:voytek.jar...@gmail.com>> wrote:
Been thinking about it, and I can't really see how with 4 nodes and RF=3, any 2 
nodes would *not* have all the data; but am more than willing to learn.

On the other thing: that's an attractive option, but in our case, the target 
cluster will likely come into use before the source-cluster data is available 
to load. Seemed to me the safest approach was sstableloader.

Thanks

On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez 
mailto:flightc...@gmail.com>> wrote:
Unfortunately, there isn't a guarantee that 2 nodes alone will have the full 
copy of data. I'd rather not say "it depends". 

TIP: If the nodes in the target cluster have identical tokens allocated, you 
can just do a straight copy of the sstables node-for-node then do nodetool 
refresh. If the target cluster is already built and you can't assign the same 
tokens then sstableloader is your only option. Cheers!

P.S. No need to apologise for asking questions. That's what we're all here for. 
Just keep them coming. 

Re: sstableloader: How much does it actually need?

2020-02-06 Thread manish khandelwal

Yes you will have all the data in two nodes provided there is no mutation
drop at node level or data is repaired

For example if you data A,B,C and D. with RF=3 and 4 nodes (node1, node2,
node3 and node4)

Data A is in node1, node2 and node3
Data B is in node2, node3, and node4
Data C is in node3, node4 and node1
Data D is in node4, node1 and node2

With this configuration, any *two nodes combined* will give all the data.


Regards
Manish

On Fri, Feb 7, 2020 at 12:53 AM Voytek Jarnot 
wrote:

> Been thinking about it, and I can't really see how with 4 nodes and RF=3,
> any 2 nodes would *not* have all the data; but am more than willing to
> learn.
>
> On the other thing: that's an attractive option, but in our case, the
> target cluster will likely come into use before the source-cluster data is
> available to load. Seemed to me the safest approach was sstableloader.
>
> Thanks
>
> On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez  wrote:
>
>> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
>> full copy of data. I'd rather not say "it depends". 
>>
>> TIP: If the nodes in the target cluster have identical tokens allocated,
>> you can just do a straight copy of the sstables node-for-node then do 
>> nodetool
>> refresh. If the target cluster is already built and you can't assign the
>> same tokens then sstableloader is your only option. Cheers!
>>
>> P.S. No need to apologise for asking questions. That's what we're all
>> here for. Just keep them coming. 
>>
>>>

Re: sstableloader: How much does it actually need?

2020-02-06 Thread Voytek Jarnot

Been thinking about it, and I can't really see how with 4 nodes and RF=3,
any 2 nodes would *not* have all the data; but am more than willing to
learn.

On the other thing: that's an attractive option, but in our case, the
target cluster will likely come into use before the source-cluster data is
available to load. Seemed to me the safest approach was sstableloader.

Thanks

On Wed, Feb 5, 2020 at 6:56 PM Erick Ramirez  wrote:

> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
> full copy of data. I'd rather not say "it depends". 
>
> TIP: If the nodes in the target cluster have identical tokens allocated,
> you can just do a straight copy of the sstables node-for-node then do nodetool
> refresh. If the target cluster is already built and you can't assign the
> same tokens then sstableloader is your only option. Cheers!
>
> P.S. No need to apologise for asking questions. That's what we're all here
> for. Just keep them coming. 
>
>>

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Erick Ramirez

>
> Another option is the DSE-bulk loader but it will require to convert to
> csv/json (good option if you don't like to play with sstableloader and deal
> to get all the sstables from all the nodes)
> https://docs.datastax.com/en/dsbulk/doc/index.html
>

Thanks, Sergio. The DataStax Bulk Loader was developed for a completely
different use case. It doesn't really make sense to go through trouble of
converting the SSTables to CSV/JSON when you've already got the SSTables to
begin with. ☺

It was really designed for loading/unloading data from non-C* sources as a
replacement for the COPY command. Cheers!

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Dor Laor

Another option is to use the Spark migrator, it reads a source CQL cluster and
writes to another. It has a validation stage that compares a full scan
and reports the diff:
https://github.com/scylladb/scylla-migrator

There are many more ways to clone a cluster. My main recommendation is
to 'optimize'
for correctness and simplicity first and only last optimize for
performance/time. Eventually
machine time for such rare operation is cheap, engineering time is
expensive and data
inconsistency is priceless..

On Wed, Feb 5, 2020 at 5:24 PM Sergio  wrote:
>
> Another option is the DSE-bulk loader but it will require to convert to 
> csv/json (good option if you don't like to play with sstableloader and deal 
> to get all the sstables from all the nodes)
> https://docs.datastax.com/en/dsbulk/doc/index.html
>
> Cheers
>
> Sergio
>
> Il giorno mer 5 feb 2020 alle ore 16:56 Erick Ramirez  
> ha scritto:
>>
>> Unfortunately, there isn't a guarantee that 2 nodes alone will have the full 
>> copy of data. I'd rather not say "it depends".
>>
>> TIP: If the nodes in the target cluster have identical tokens allocated, you 
>> can just do a straight copy of the sstables node-for-node then do nodetool 
>> refresh. If the target cluster is already built and you can't assign the 
>> same tokens then sstableloader is your only option. Cheers!
>>
>> P.S. No need to apologise for asking questions. That's what we're all here 
>> for. Just keep them coming.

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Sergio

Another option is the DSE-bulk loader but it will require to convert to
csv/json (good option if you don't like to play with sstableloader and deal
to get all the sstables from all the nodes)
https://docs.datastax.com/en/dsbulk/doc/index.html

Cheers

Sergio

Il giorno mer 5 feb 2020 alle ore 16:56 Erick Ramirez 
ha scritto:

> Unfortunately, there isn't a guarantee that 2 nodes alone will have the
> full copy of data. I'd rather not say "it depends". 
>
> TIP: If the nodes in the target cluster have identical tokens allocated,
> you can just do a straight copy of the sstables node-for-node then do nodetool
> refresh. If the target cluster is already built and you can't assign the
> same tokens then sstableloader is your only option. Cheers!
>
> P.S. No need to apologise for asking questions. That's what we're all here
> for. Just keep them coming. 
>
>>

Re: sstableloader: How much does it actually need?

2020-02-05 Thread Erick Ramirez

Unfortunately, there isn't a guarantee that 2 nodes alone will have the
full copy of data. I'd rather not say "it depends". 

TIP: If the nodes in the target cluster have identical tokens allocated,
you can just do a straight copy of the sstables node-for-node then do nodetool
refresh. If the target cluster is already built and you can't assign the
same tokens then sstableloader is your only option. Cheers!

P.S. No need to apologise for asking questions. That's what we're all here
for. Just keep them coming. 

>

Re: [EXTERNAL] Re: sstableloader & num_tokens change

2020-01-27 Thread Voytek Jarnot

Odd. Have you seen this behavior? I ran a test last week, loaded snapshots
from 4 nodes to 4 nodes (RF 3 on both ends) and did not notice a spike.
That's not to say that it didn't happen, but I think I'd have noticed as I
was loading approx 250GB x 4 (although sequentially rather than 4x
sstableloader in parallel).

Also, thanks to everyone for confirming no issue with num_tokens and
sstableloader; appreciate it.


On Mon, Jan 27, 2020 at 9:02 AM Durity, Sean R 
wrote:

> I would suggest to be aware of potential data size expansion. If you load
> (for example) three copies of the data into a new cluster (because the RF
> of the origin cluster is 3), it will also get written to the RF of the new
> cluster (3 more times). So, you could see data expansion of 9x the original
> data size (or, origin RF * target RF), until compaction can run.
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Erick Ramirez 
> *Sent:* Friday, January 24, 2020 11:03 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: sstableloader & num_tokens change
>
>
>
>
>
> If I may just loop this back to the question at hand:
>
> I'm curious if there are any gotchas with using sstableloader to restore
> snapshots taken from 256-token nodes into a cluster with 32-token (or your
> preferred number of tokens) nodes (otherwise same # of nodes and same RF).
>
>
>
> No, there isn't. It will work as designed so you're good to go. Cheers!
>
>
>
>
>
>
> --
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

RE: [EXTERNAL] Re: sstableloader & num_tokens change

2020-01-27 Thread Durity, Sean R

I would suggest to be aware of potential data size expansion. If you load (for 
example) three copies of the data into a new cluster (because the RF of the 
origin cluster is 3), it will also get written to the RF of the new cluster (3 
more times). So, you could see data expansion of 9x the original data size (or, 
origin RF * target RF), until compaction can run.


Sean Durity – Staff Systems Engineer, Cassandra

From: Erick Ramirez 
Sent: Friday, January 24, 2020 11:03 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: sstableloader & num_tokens change


If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore 
snapshots taken from 256-token nodes into a cluster with 32-token (or your 
preferred number of tokens) nodes (otherwise same # of nodes and same RF).

No, there isn't. It will work as designed so you're good to go. Cheers!





The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: sstableloader & num_tokens change

2020-01-27 Thread Jean Carlo

Hello

Concerning the original question, I agreed with @eric_ramirez,
sstableloader is transparent for token allocation number.

just for info @voytek, check this post out
https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
You lay be interested to now if you have your cluster well balanced with 32
tokens. 32 tokens seems to be the future default value, but changing the
default vnodes token numbers seems not to be so straight forward

cheers

Jean Carlo

"The best way to predict the future is to invent it" Alan Kay

On Sat, Jan 25, 2020 at 5:05 AM Erick Ramirez  wrote:

> On the subject of DSBulk, sstableloader is the tool of choice for this
> scenario.
>
> +1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader
> for CSV/JSON formats. Cheers!
>

Re: sstableloader & num_tokens change

2020-01-24 Thread Erick Ramirez

On the subject of DSBulk, sstableloader is the tool of choice for this
scenario.

+1 to Sergio and I'm confirming that DSBulk is designed as a bulk loader
for CSV/JSON formats. Cheers!

Re: sstableloader & num_tokens change

2020-01-24 Thread Erick Ramirez

> If I may just loop this back to the question at hand:
>
> I'm curious if there are any gotchas with using sstableloader to restore
> snapshots taken from 256-token nodes into a cluster with 32-token (or your
> preferred number of tokens) nodes (otherwise same # of nodes and same RF).
>

No, there isn't. It will work as designed so you're good to go. Cheers!


>

Re: sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot

If I may just loop this back to the question at hand:

I'm curious if there are any gotchas with using sstableloader to restore
snapshots taken from 256-token nodes into a cluster with 32-token (or your
preferred number of tokens) nodes (otherwise same # of nodes and same RF).

On Fri, Jan 24, 2020 at 11:15 AM Sergio  wrote:

> https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html
>
> Just skimming through the docs
>
> I see examples by loading from CSV / JSON
>
> Maybe there is some other command or doc page that I am missing
>
>
>
>
> On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth  wrote:
>
>> Dsbulk works same as sstableloder.
>>
>>
>> Regards,
>>
>> Nitan
>>
>> Cell: 510 449 9629
>>
>> On Jan 24, 2020, at 10:40 AM, Sergio  wrote:
>>
>> 
>> I was wondering if that improvement for token allocation would work even
>> with just one rack. It should but I am not sure.
>>
>> Does Dsbulk support migration cluster to cluster without CSV or JSON
>> export?
>>
>> Thanks and Regards
>>
>> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:
>>
>>> Instead of sstableloader consider dsbulk by datastax.
>>>
>>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
>>> rpinchb...@tripadvisor.com> wrote:
>>>
 Jon Haddad has previously made the case for num_tokens=4.  His
 Accelerate 2019 talk is available at:

 https://www.youtube.com/watch?v=swL7bCnolkU

 You might want to check that out.  Also I think the amount of effort
 you put into evening out the token distribution increases as vnode count
 shrinks.  The caveats are explored at:

 https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html

 *From: *Voytek Jarnot 
 *Reply-To: *"user@cassandra.apache.org" 
 *Date: *Friday, January 24, 2020 at 10:39 AM
 *To: *"user@cassandra.apache.org" 
 *Subject: *sstableloader & num_tokens change

 *Message from External Sender*

 Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different
 4 node RF=3 cluster.

 I've read that 256 is not an optimal default num_tokens value, and that
 32 is likely a better option.

 We have the "opportunity" to switch, as we're migrating environments
 and will likely be using sstableloader to do so. I'm curious if there are
 any gotchas with using sstableloader to restore snapshots taken from
 256-token nodes into a cluster with 32-token nodes (otherwise same # of
 nodes and same RF).

 Thanks in advance.

>>>

Re: sstableloader & num_tokens change

2020-01-24 Thread Sergio

https://docs.datastax.com/en/dsbulk/doc/dsbulk/reference/dsbulkLoad.html

Just skimming through the docs

I see examples by loading from CSV / JSON

Maybe there is some other command or doc page that I am missing




On Fri, Jan 24, 2020, 9:10 AM Nitan Kainth  wrote:

> Dsbulk works same as sstableloder.
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On Jan 24, 2020, at 10:40 AM, Sergio  wrote:
>
> 
> I was wondering if that improvement for token allocation would work even
> with just one rack. It should but I am not sure.
>
> Does Dsbulk support migration cluster to cluster without CSV or JSON
> export?
>
> Thanks and Regards
>
> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:
>
>> Instead of sstableloader consider dsbulk by datastax.
>>
>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
>> rpinchb...@tripadvisor.com> wrote:
>>
>>> Jon Haddad has previously made the case for num_tokens=4.  His
>>> Accelerate 2019 talk is available at:
>>>
>>>
>>>
>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>>
>>>
>>>
>>> You might want to check that out.  Also I think the amount of effort you
>>> put into evening out the token distribution increases as vnode count
>>> shrinks.  The caveats are explored at:
>>>
>>>
>>>
>>>
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>>
>>>
>>>
>>>
>>>
>>> *From: *Voytek Jarnot 
>>> *Reply-To: *"user@cassandra.apache.org" 
>>> *Date: *Friday, January 24, 2020 at 10:39 AM
>>> *To: *"user@cassandra.apache.org" 
>>> *Subject: *sstableloader & num_tokens change
>>>
>>>
>>>
>>> *Message from External Sender*
>>>
>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different
>>> 4 node RF=3 cluster.
>>>
>>>
>>>
>>> I've read that 256 is not an optimal default num_tokens value, and that
>>> 32 is likely a better option.
>>>
>>>
>>>
>>> We have the "opportunity" to switch, as we're migrating environments and
>>> will likely be using sstableloader to do so. I'm curious if there are any
>>> gotchas with using sstableloader to restore snapshots taken from 256-token
>>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>>> same RF).
>>>
>>>
>>>
>>> Thanks in advance.
>>>
>>

Re: sstableloader & num_tokens change

2020-01-24 Thread Nitan Kainth

Dsbulk works same as sstableloder.


Regards,
Nitan
Cell: 510 449 9629

> On Jan 24, 2020, at 10:40 AM, Sergio  wrote:
> 
> 
> I was wondering if that improvement for token allocation would work even with 
> just one rack. It should but I am not sure.
> 
> Does Dsbulk support migration cluster to cluster without CSV or JSON export?
> 
> Thanks and Regards
> 
>> On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:
>> Instead of sstableloader consider dsbulk by datastax. 
>> 
>>> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback 
>>>  wrote:
>>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate 
>>> 2019 talk is available at:
>>> 
>>>  
>>> 
>>> https://www.youtube.com/watch?v=swL7bCnolkU
>>> 
>>>  
>>> 
>>> You might want to check that out.  Also I think the amount of effort you 
>>> put into evening out the token distribution increases as vnode count 
>>> shrinks.  The caveats are explored at:
>>> 
>>>  
>>> 
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>> 
>>>  
>>> 
>>>  
>>> 
>>> From: Voytek Jarnot 
>>> Reply-To: "user@cassandra.apache.org" 
>>> Date: Friday, January 24, 2020 at 10:39 AM
>>> To: "user@cassandra.apache.org" 
>>> Subject: sstableloader & num_tokens change
>>> 
>>>  
>>> 
>>> Message from External Sender
>>> 
>>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 
>>> node RF=3 cluster.
>>> 
>>>  
>>> 
>>> I've read that 256 is not an optimal default num_tokens value, and that 32 
>>> is likely a better option.
>>> 
>>>  
>>> 
>>> We have the "opportunity" to switch, as we're migrating environments and 
>>> will likely be using sstableloader to do so. I'm curious if there are any 
>>> gotchas with using sstableloader to restore snapshots taken from 256-token 
>>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and 
>>> same RF).
>>> 
>>>  
>>> 
>>> Thanks in advance.

Re: sstableloader & num_tokens change

2020-01-24 Thread Voytek Jarnot

Why? Seems to me that the old Cassandra -> CSV/JSON and CSV/JSON -> new
Cassandra are unnecessary steps in my case.

On Fri, Jan 24, 2020 at 10:34 AM Nitan Kainth  wrote:

> Instead of sstableloader consider dsbulk by datastax.
>
> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
> rpinchb...@tripadvisor.com> wrote:
>
>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
>> 2019 talk is available at:
>>
>>
>>
>> https://www.youtube.com/watch?v=swL7bCnolkU
>>
>>
>>
>> You might want to check that out.  Also I think the amount of effort you
>> put into evening out the token distribution increases as vnode count
>> shrinks.  The caveats are explored at:
>>
>>
>>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>>
>>
>>
>>
>> *From: *Voytek Jarnot 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Friday, January 24, 2020 at 10:39 AM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *sstableloader & num_tokens change
>>
>>
>>
>> *Message from External Sender*
>>
>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
>> node RF=3 cluster.
>>
>>
>>
>> I've read that 256 is not an optimal default num_tokens value, and that
>> 32 is likely a better option.
>>
>>
>>
>> We have the "opportunity" to switch, as we're migrating environments and
>> will likely be using sstableloader to do so. I'm curious if there are any
>> gotchas with using sstableloader to restore snapshots taken from 256-token
>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>> same RF).
>>
>>
>>
>> Thanks in advance.
>>
>

Re: sstableloader & num_tokens change

2020-01-24 Thread Sergio

I was wondering if that improvement for token allocation would work even
with just one rack. It should but I am not sure.

Does Dsbulk support migration cluster to cluster without CSV or JSON export?

Thanks and Regards

On Fri, Jan 24, 2020, 8:34 AM Nitan Kainth  wrote:

> Instead of sstableloader consider dsbulk by datastax.
>
> On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback <
> rpinchb...@tripadvisor.com> wrote:
>
>> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
>> 2019 talk is available at:
>>
>>
>>
>> https://www.youtube.com/watch?v=swL7bCnolkU
>>
>>
>>
>> You might want to check that out.  Also I think the amount of effort you
>> put into evening out the token distribution increases as vnode count
>> shrinks.  The caveats are explored at:
>>
>>
>>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>
>>
>>
>>
>>
>> *From: *Voytek Jarnot 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Friday, January 24, 2020 at 10:39 AM
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *sstableloader & num_tokens change
>>
>>
>>
>> *Message from External Sender*
>>
>> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
>> node RF=3 cluster.
>>
>>
>>
>> I've read that 256 is not an optimal default num_tokens value, and that
>> 32 is likely a better option.
>>
>>
>>
>> We have the "opportunity" to switch, as we're migrating environments and
>> will likely be using sstableloader to do so. I'm curious if there are any
>> gotchas with using sstableloader to restore snapshots taken from 256-token
>> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
>> same RF).
>>
>>
>>
>> Thanks in advance.
>>
>

Re: sstableloader & num_tokens change

2020-01-24 Thread Nitan Kainth

Instead of sstableloader consider dsbulk by datastax.

On Fri, Jan 24, 2020 at 10:20 AM Reid Pinchback 
wrote:

> Jon Haddad has previously made the case for num_tokens=4.  His Accelerate
> 2019 talk is available at:
>
>
>
> https://www.youtube.com/watch?v=swL7bCnolkU
>
>
>
> You might want to check that out.  Also I think the amount of effort you
> put into evening out the token distribution increases as vnode count
> shrinks.  The caveats are explored at:
>
>
>
>
> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>
>
>
>
>
> *From: *Voytek Jarnot 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Friday, January 24, 2020 at 10:39 AM
> *To: *"user@cassandra.apache.org" 
> *Subject: *sstableloader & num_tokens change
>
>
>
> *Message from External Sender*
>
> Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4
> node RF=3 cluster.
>
>
>
> I've read that 256 is not an optimal default num_tokens value, and that 32
> is likely a better option.
>
>
>
> We have the "opportunity" to switch, as we're migrating environments and
> will likely be using sstableloader to do so. I'm curious if there are any
> gotchas with using sstableloader to restore snapshots taken from 256-token
> nodes into a cluster with 32-token nodes (otherwise same # of nodes and
> same RF).
>
>
>
> Thanks in advance.
>

Re: sstableloader & num_tokens change

2020-01-24 Thread Reid Pinchback

Jon Haddad has previously made the case for num_tokens=4.  His Accelerate 2019 
talk is available at:

https://www.youtube.com/watch?v=swL7bCnolkU

You might want to check that out.  Also I think the amount of effort you put 
into evening out the token distribution increases as vnode count shrinks.  The 
caveats are explored at:

https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html


From: Voytek Jarnot 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, January 24, 2020 at 10:39 AM
To: "user@cassandra.apache.org" 
Subject: sstableloader & num_tokens change

Message from External Sender
Running 3.11.x, 4 nodes RF=3, default 256 tokens; moving to a different 4 node 
RF=3 cluster.

I've read that 256 is not an optimal default num_tokens value, and that 32 is 
likely a better option.

We have the "opportunity" to switch, as we're migrating environments and will 
likely be using sstableloader to do so. I'm curious if there are any gotchas 
with using sstableloader to restore snapshots taken from 256-token nodes into a 
cluster with 32-token nodes (otherwise same # of nodes and same RF).

Thanks in advance.

Re: [EXTERNAL] Re: Sstableloader

2019-05-30 Thread Goetz, Anthony

It appears you have two goals you are trying to accomplish at the same time.  
My recommendation is to break it into two different steps.  You need to decide 
if you are going to upgrade DSE or OSS.


  *   Upgrade DSE then migrate to OSS
 *   Upgrade DSE to version that matches OSS 3.11.3 binary
 *   Perform datacenter switch
  *   Migrate to OSS then upgrade
 *   Migrate to OSS using version that matches DSE Cassandra binary (DSE 
5.0.7 = 3.0.11)
 *   Upgrade OSS to 3.11.3 binary

From: Rahul Reddy 
Date: Thursday, May 30, 2019 at 6:37 AM
To: Cassandra User List 
Cc: Anthony Goetz 
Subject: [EXTERNAL] Re: Sstableloader

Thank you Anthony and Jonathan. To add new ring it doesn't have to be same 
version of Cassandra right. For ex dse 5.12 which is 3.11.0 has stables with mc 
name and apache 3.11.3 also uses sstables name with mc . We should be still 
able to add it to the ring correct

On Wed, May 29, 2019, 9:55 PM Goetz, Anthony 
mailto:anthony_goe...@comcast.com>> wrote:
My team migrated from DSE to OSS a few years ago by doing datacenter switch.  
You will need to update replication strategy for all keyspaces that are using 
Everywhere to NetworkTopologyStrategy before adding any OSS nodes.  As Jonathan 
mentioned, DSE nodes will revert this change on restart.  To account for this, 
we modified our init script to call a cql script that would make sure the 
keyspaces were set back to NetworkTopologyStrategy.

High Level Plan:

  *   Find DSE Cassandra binary version
  *   Review config to make sure you are not using any DSE specific settings
  *   Update replication strategy on keyspaces using Everywhere to 
NetworkTopologyStrategy
  *   Add OSS DC using same binary version as DSE
  *   Migrate clients to new OSS DC
  *   Decommission DSE DC

Note:  OpsCenter will stop working once you add OSS nodes.

From: Jonathan Koppenhofer mailto:j...@koppedomain.com>>
Reply-To: Cassandra User List 
mailto:user@cassandra.apache.org>>
Date: Wednesday, May 29, 2019 at 6:45 PM
To: Cassandra User List 
mailto:user@cassandra.apache.org>>
Subject: [EXTERNAL] Re: Sstableloader

Has anyone tried to do a DC switch as a means to migrate from Datastax to OSS? 
This would be the safest route as the ability to revert back to Datastax is 
easy. However, I'm curious how the dse_system keyspace would be replicated to 
OSS using their custom Everywhere strategy. You may have to change the to 
Network topology strategy before firing up OSS nodes. Also, keep in mind if you 
restart any DSE nodes, it will revert that keyspace back to EverywhereStrategy.

I also posted a means to migrate in place on this mailing list a few months 
back (thanks for help from others on the mailing list), but it is a little more 
involved and risky. Let me know if you can't find it, and I'll dig it up.

Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS 3.0 
then up to 3.11.
On Wed, May 29, 2019, 5:56 PM Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If cassandra version is same, it should work

Regards,
Nitan
Cell: 510 449 9629

On May 28, 2019, at 4:21 PM, Rahul Reddy 
mailto:rahulreddy1...@gmail.com>> wrote:
Hello,

Does sstableloader works between datastax and Apache cassandra. I'm trying to 
migrate dse 5.0.7 to Apache 3.11.1 ?

Re: Sstableloader

2019-05-30 Thread Rahul Reddy

Thank you Anthony and Jonathan. To add new ring it doesn't have to be same
version of Cassandra right. For ex dse 5.12 which is 3.11.0 has stables
with mc name and apache 3.11.3 also uses sstables name with mc . We should
be still able to add it to the ring correct

On Wed, May 29, 2019, 9:55 PM Goetz, Anthony 
wrote:

> My team migrated from DSE to OSS a few years ago by doing datacenter
> switch.  You will need to update replication strategy for all keyspaces
> that are using Everywhere to NetworkTopologyStrategy before adding any OSS
> nodes.  As Jonathan mentioned, DSE nodes will revert this change on
> restart.  To account for this, we modified our init script to call a cql
> script that would make sure the keyspaces were set back to
> NetworkTopologyStrategy.
>
>
>
> High Level Plan:
>
>- Find DSE Cassandra binary version
>- Review config to make sure you are not using any DSE specific
>settings
>- Update replication strategy on keyspaces using Everywhere to
>NetworkTopologyStrategy
>- Add OSS DC using same binary version as DSE
>- Migrate clients to new OSS DC
>- Decommission DSE DC
>
>
>
> Note:  OpsCenter will stop working once you add OSS nodes.
>
>
>
> *From: *Jonathan Koppenhofer 
> *Reply-To: *Cassandra User List 
> *Date: *Wednesday, May 29, 2019 at 6:45 PM
> *To: *Cassandra User List 
> *Subject: *[EXTERNAL] Re: Sstableloader
>
>
>
> Has anyone tried to do a DC switch as a means to migrate from Datastax to
> OSS? This would be the safest route as the ability to revert back to
> Datastax is easy. However, I'm curious how the dse_system keyspace would be
> replicated to OSS using their custom Everywhere strategy. You may have to
> change the to Network topology strategy before firing up OSS nodes. Also,
> keep in mind if you restart any DSE nodes, it will revert that keyspace
> back to EverywhereStrategy.
>
>
>
> I also posted a means to migrate in place on this mailing list a few
> months back (thanks for help from others on the mailing list), but it is a
> little more involved and risky. Let me know if you can't find it, and I'll
> dig it up.
>
>
>
> Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS
> 3.0 then up to 3.11.
>
> On Wed, May 29, 2019, 5:56 PM Nitan Kainth  wrote:
>
> If cassandra version is same, it should work
>
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
>
> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
>
> Hello,
>
>
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>
>

Re: Sstableloader

2019-05-29 Thread Patrick Lee

Over the past year we've migrated several clusters from DSE to Apache
Cassandra. We've mostly done I place conversions node by node with no
downtime.  DSE 4.8.X to Apache Cassandra 2.1.x

On Wed, May 29, 2019 at 8:55 PM Goetz, Anthony 
wrote:

> My team migrated from DSE to OSS a few years ago by doing datacenter
> switch.  You will need to update replication strategy for all keyspaces
> that are using Everywhere to NetworkTopologyStrategy before adding any OSS
> nodes.  As Jonathan mentioned, DSE nodes will revert this change on
> restart.  To account for this, we modified our init script to call a cql
> script that would make sure the keyspaces were set back to
> NetworkTopologyStrategy.
>
>
>
> High Level Plan:
>
>- Find DSE Cassandra binary version
>- Review config to make sure you are not using any DSE specific
>settings
>- Update replication strategy on keyspaces using Everywhere to
>NetworkTopologyStrategy
>- Add OSS DC using same binary version as DSE
>- Migrate clients to new OSS DC
>- Decommission DSE DC
>
>
>
> Note:  OpsCenter will stop working once you add OSS nodes.
>
>
>
> *From: *Jonathan Koppenhofer 
> *Reply-To: *Cassandra User List 
> *Date: *Wednesday, May 29, 2019 at 6:45 PM
> *To: *Cassandra User List 
> *Subject: *[EXTERNAL] Re: Sstableloader
>
>
>
> Has anyone tried to do a DC switch as a means to migrate from Datastax to
> OSS? This would be the safest route as the ability to revert back to
> Datastax is easy. However, I'm curious how the dse_system keyspace would be
> replicated to OSS using their custom Everywhere strategy. You may have to
> change the to Network topology strategy before firing up OSS nodes. Also,
> keep in mind if you restart any DSE nodes, it will revert that keyspace
> back to EverywhereStrategy.
>
>
>
> I also posted a means to migrate in place on this mailing list a few
> months back (thanks for help from others on the mailing list), but it is a
> little more involved and risky. Let me know if you can't find it, and I'll
> dig it up.
>
>
>
> Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS
> 3.0 then up to 3.11.
>
> On Wed, May 29, 2019, 5:56 PM Nitan Kainth  wrote:
>
> If cassandra version is same, it should work
>
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
>
> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
>
> Hello,
>
>
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>
>

Re: Sstableloader

2019-05-29 Thread Goetz, Anthony

My team migrated from DSE to OSS a few years ago by doing datacenter switch.  
You will need to update replication strategy for all keyspaces that are using 
Everywhere to NetworkTopologyStrategy before adding any OSS nodes.  As Jonathan 
mentioned, DSE nodes will revert this change on restart.  To account for this, 
we modified our init script to call a cql script that would make sure the 
keyspaces were set back to NetworkTopologyStrategy.

High Level Plan:

  *   Find DSE Cassandra binary version
  *   Review config to make sure you are not using any DSE specific settings
  *   Update replication strategy on keyspaces using Everywhere to 
NetworkTopologyStrategy
  *   Add OSS DC using same binary version as DSE
  *   Migrate clients to new OSS DC
  *   Decommission DSE DC

Note:  OpsCenter will stop working once you add OSS nodes.

From: Jonathan Koppenhofer 
Reply-To: Cassandra User List 
Date: Wednesday, May 29, 2019 at 6:45 PM
To: Cassandra User List 
Subject: [EXTERNAL] Re: Sstableloader

Has anyone tried to do a DC switch as a means to migrate from Datastax to OSS? 
This would be the safest route as the ability to revert back to Datastax is 
easy. However, I'm curious how the dse_system keyspace would be replicated to 
OSS using their custom Everywhere strategy. You may have to change the to 
Network topology strategy before firing up OSS nodes. Also, keep in mind if you 
restart any DSE nodes, it will revert that keyspace back to EverywhereStrategy.

I also posted a means to migrate in place on this mailing list a few months 
back (thanks for help from others on the mailing list), but it is a little more 
involved and risky. Let me know if you can't find it, and I'll dig it up.

Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS 3.0 
then up to 3.11.
On Wed, May 29, 2019, 5:56 PM Nitan Kainth 
mailto:nitankai...@gmail.com>> wrote:
If cassandra version is same, it should work

Regards,
Nitan
Cell: 510 449 9629

On May 28, 2019, at 4:21 PM, Rahul Reddy 
mailto:rahulreddy1...@gmail.com>> wrote:
Hello,

Does sstableloader works between datastax and Apache cassandra. I'm trying to 
migrate dse 5.0.7 to Apache 3.11.1 ?

Re: Sstableloader

2019-05-29 Thread Jonathan Koppenhofer

Has anyone tried to do a DC switch as a means to migrate from Datastax to
OSS? This would be the safest route as the ability to revert back to
Datastax is easy. However, I'm curious how the dse_system keyspace would be
replicated to OSS using their custom Everywhere strategy. You may have to
change the to Network topology strategy before firing up OSS nodes. Also,
keep in mind if you restart any DSE nodes, it will revert that keyspace
back to EverywhereStrategy.

I also posted a means to migrate in place on this mailing list a few months
back (thanks for help from others on the mailing list), but it is a little
more involved and risky. Let me know if you can't find it, and I'll dig it
up.

Finally, DSE 5.0 is open source equivalent 3.0.x. recommend you go to OSS
3.0 then up to 3.11.

On Wed, May 29, 2019, 5:56 PM Nitan Kainth  wrote:

> If cassandra version is same, it should work
>
>
> Regards,
>
> Nitan
>
> Cell: 510 449 9629
>
> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
>
> Hello,
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>
>

Re: Sstableloader

2019-05-29 Thread Nitan Kainth

If cassandra version is same, it should work


Regards,
Nitan
Cell: 510 449 9629

> On May 28, 2019, at 4:21 PM, Rahul Reddy  wrote:
> 
> Hello,
> 
> Does sstableloader works between datastax and Apache cassandra. I'm trying to 
> migrate dse 5.0.7 to Apache 3.11.1 ?

Re: Sstableloader

2019-05-29 Thread Alain RODRIGUEZ

Hello,

I can't answer this question about the sstableloader (even though I think
it should be ok). My understanding, even though I'm not really up to date
with latest Datastax work, is that DSE uses a modified but compatible
version of Cassandra, for everything that is not 'DSE feature'
specifically. Especially I expect SSTable format to be the same.
SSTable loader has always been slow and inefficient for me though I did not
use it much.

I think the way out DSE should be documented somewhere in Datastax docs, if
not I think you can ask Datastax directly (or maybe someone here can help
you).

My guess is that the safest way out, without any downtime is probably to
perform a datacenter 'switch':
- Identify the Apache Cassandra version used under the hood by DSE (5.0.7).
Let's say it's 3.11.1 (I don't know)
- Add a new Apache Cassandra datacenter to your DSE cluster using this
version (I would rather use 3.11.latest in this case though... 3.11.1 had
memory leaks and other wild issues).
- Move client to this new DC
- Shutdown the old DC.

I wrote a runbook to perform such an operation not that long ago, you can
find it here:
https://thelastpickle.com/blog/2019/02/26/data-center-switch.html

I don't know for sure that this is the best way to go out of DSE, but that
would be my guess and the first thing I would investigate (before
SSTableLoader, clearly).

Hope that helps, even though it does not directly answers the question
(that I'm unable to answer) about SSTable & SSTableLoader compatibility
with DSE clusters.

C*heers

Le mar. 28 mai 2019 à 22:22, Rahul Reddy  a
écrit :

> Hello,
>
> Does sstableloader works between datastax and Apache cassandra. I'm trying
> to migrate dse 5.0.7 to Apache 3.11.1 ?
>

Re: sstableloader from dse 4.8.4 to apache cassandra 3.11.1

2018-06-19 Thread rajpal reddy

Never mind found it. its not a supported version.

> On Jun 19, 2018, at 2:41 PM, rajpal reddy  wrote:
> 
> 
> Hello,
> 
> I’m trying to use sstablloader from dse 4.8.4( 2.1.12) to apache 3.11.1, i’m 
> getting below error. but works fine when i use stableloader dse 5.1.2(apache 
> 3.11.0)
> Could not retrieve endpoint ranges: 
> java.io.IOException: Failed to open transport to: host-ip:9160.
> 
> Any work around to use the stable loader from use 4.8.4(apache 2.1.12) to 
> apache 3.11.1
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: SSTableLoader Question

2018-02-19 Thread shalom sagges

Sounds good.

Thanks for the explanation!

On Sun, Feb 18, 2018 at 5:15 PM, Rahul Singh 
wrote:

> If you don’t have access to the file you don’t have access to the file.
> I’ve seen this issue several times. It’s he easiest low hanging fruit to
> resolve. So figure it out and make sure that it’s Cassandra.Cassandra from
> root to he Data folder and either run as root or sudo it.
>
> If it’s compacted it won’t be there so you won’t have the file. I’m not
> aware of this event being communicated to Sstableloader via SEDA. Besides,
> the sstable that you are loading SHOULD not be live. If you at streaming a
> life sstable, it means you are using sstableloader not as it is designed to
> be used - which is with static files.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 18, 2018, 9:22 AM -0500, shalom sagges ,
> wrote:
>
> Not really sure with which user I ran it (root or cassandra), although I
> don't understand why a permission issue will generate a File not Found
> exception?
>
> And in general, what if a file is being streamed and got compacted before
> the streaming ended. Does Cassandra know how to handle this?
>
> Thanks!
>
> On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh  > wrote:
>
>> Check permissions maybe? Who owns the files vs. who is running
>> sstableloader.
>>
>> --
>> Rahul Singh
>> rahul.si...@anant.us
>>
>> Anant Corporation
>>
>> On Feb 18, 2018, 4:26 AM -0500, shalom sagges ,
>> wrote:
>>
>> Hi All,
>>
>> C* version 2.0.14.
>>
>> I was loading some data to another cluster using SSTableLoader. The
>> streaming failed with the following error:
>>
>>
>> Streaming error occurred
>> java.lang.RuntimeException: java.io.*FileNotFoundException*:
>> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file
>> or directory)
>> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
>> r.open(CompressedRandomAccessReader.java:59)
>> at org.apache.cassandra.io.sstable.SSTableReader.openDataReader
>> (SSTableReader.java:1409)
>> at org.apache.cassandra.streaming.compress.CompressedStreamWrit
>> er.write(CompressedStreamWriter.java:55)
>> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
>> 1.serialize(OutgoingFileMessage.java:59)
>> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
>> 1.serialize(OutgoingFileMessage.java:42)
>> at org.apache.cassandra.streaming.messages.StreamMessage.serial
>> ize(StreamMessage.java:45)
>> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
>> sageHandler.sendMessage(ConnectionHandler.java:339)
>> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
>> sageHandler.run(ConnectionHandler.java:311)
>> at java.lang.Thread.run(Thread.java:722)
>> Caused by: java.io.*FileNotFoundException*:
>> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file
>> or directory)
>> at java.io.RandomAccessFile.open(Native Method)
>> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
>> at org.apache.cassandra.io.util.RandomAccessReader.(Rando
>> mAccessReader.java:58)
>> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
>> r.(CompressedRandomAccessReader.java:76)
>> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
>> r.open(CompressedRandomAccessReader.java:55)
>> ... 8 more
>>  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream
>> failed
>>
>>
>>
>> Did I miss something when running the load? Was the file suddenly missing
>> due to compaction?
>> If so, did I need to disable auto compaction or stop the service
>> beforehand? (didn't find any reference to compaction in the docs)
>>
>> I know it's an old version, but I didn't find any related bugs on "File
>> not found" exceptions.
>>
>> Thanks!
>>
>>
>>
>

Re: SSTableLoader Question

2018-02-18 Thread Rahul Singh

If you don’t have access to the file you don’t have access to the file. I’ve 
seen this issue several times. It’s he easiest low hanging fruit to resolve. So 
figure it out and make sure that it’s Cassandra.Cassandra from root to he Data 
folder and either run as root or sudo it.

If it’s compacted it won’t be there so you won’t have the file. I’m not aware 
of this event being communicated to Sstableloader via SEDA. Besides, the 
sstable that you are loading SHOULD not be live. If you at streaming a life 
sstable, it means you are using sstableloader not as it is designed to be used 
- which is with static files.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 9:22 AM -0500, shalom sagges , wrote:
> Not really sure with which user I ran it (root or cassandra), although I 
> don't understand why a permission issue will generate a File not Found 
> exception?
>
> And in general, what if a file is being streamed and got compacted before the 
> streaming ended. Does Cassandra know how to handle this?
>
> Thanks!
>
> > On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh  
> > wrote:
> > > Check permissions maybe? Who owns the files vs. who is running 
> > > sstableloader.
> > >
> > > --
> > > Rahul Singh
> > > rahul.si...@anant.us
> > >
> > > Anant Corporation
> > >
> > > On Feb 18, 2018, 4:26 AM -0500, shalom sagges , 
> > > wrote:
> > > > Hi All,
> > > >
> > > > C* version 2.0.14.
> > > >
> > > > I was loading some data to another cluster using SSTableLoader. The 
> > > > streaming failed with the following error:
> > > >
> > > >
> > > > Streaming error occurred
> > > > java.lang.RuntimeException: java.io.FileNotFoundException: 
> > > > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file 
> > > > or directory)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59)
> > > >     at 
> > > > org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1409)
> > > >     at 
> > > > org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:55)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
> > > >     at 
> > > > org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
> > > >     at 
> > > > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
> > > >     at 
> > > > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)
> > > >     at java.lang.Thread.run(Thread.java:722)
> > > > Caused by: java.io.FileNotFoundException: 
> > > > /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file 
> > > > or directory)
> > > >     at java.io.RandomAccessFile.open(Native Method)
> > > >     at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> > > >     at 
> > > > org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:76)
> > > >     at 
> > > > org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:55)
> > > >     ... 8 more
> > > >  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] 
> > > > Stream failed
> > > >
> > > >
> > > >
> > > > Did I miss something when running the load? Was the file suddenly 
> > > > missing due to compaction?
> > > > If so, did I need to disable auto compaction or stop the service 
> > > > beforehand? (didn't find any reference to compaction in the docs)
> > > >
> > > > I know it's an old version, but I didn't find any related bugs on "File 
> > > > not found" exceptions.
> > > >
> > > > Thanks!
> > > >
> > > >
>

Re: SSTableLoader Question

2018-02-18 Thread shalom sagges

Not really sure with which user I ran it (root or cassandra), although I
don't understand why a permission issue will generate a File not Found
exception?

And in general, what if a file is being streamed and got compacted before
the streaming ended. Does Cassandra know how to handle this?

Thanks!

On Sun, Feb 18, 2018 at 3:58 PM, Rahul Singh 
wrote:

> Check permissions maybe? Who owns the files vs. who is running
> sstableloader.
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On Feb 18, 2018, 4:26 AM -0500, shalom sagges ,
> wrote:
>
> Hi All,
>
> C* version 2.0.14.
>
> I was loading some data to another cluster using SSTableLoader. The
> streaming failed with the following error:
>
>
> Streaming error occurred
> java.lang.RuntimeException: java.io.*FileNotFoundException*:
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file
> or directory)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.open(CompressedRandomAccessReader.java:59)
> at org.apache.cassandra.io.sstable.SSTableReader.openDataReader
> (SSTableReader.java:1409)
> at org.apache.cassandra.streaming.compress.CompressedStreamWrit
> er.write(CompressedStreamWriter.java:55)
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
> 1.serialize(OutgoingFileMessage.java:59)
> at org.apache.cassandra.streaming.messages.OutgoingFileMessage$
> 1.serialize(OutgoingFileMessage.java:42)
> at org.apache.cassandra.streaming.messages.StreamMessage.
> serialize(StreamMessage.java:45)
> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
> sageHandler.sendMessage(ConnectionHandler.java:339)
> at org.apache.cassandra.streaming.ConnectionHandler$OutgoingMes
> sageHandler.run(ConnectionHandler.java:311)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.*FileNotFoundException*: /data1/keyspace1/table1/
> keyspace1-table1-jb-65174-Data.db (No such file or directory)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> at org.apache.cassandra.io.util.RandomAccessReader.(Rando
> mAccessReader.java:58)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.(CompressedRandomAccessReader.java:76)
> at org.apache.cassandra.io.compress.CompressedRandomAccessReade
> r.open(CompressedRandomAccessReader.java:55)
> ... 8 more
>  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream
> failed
>
>
>
> Did I miss something when running the load? Was the file suddenly missing
> due to compaction?
> If so, did I need to disable auto compaction or stop the service
> beforehand? (didn't find any reference to compaction in the docs)
>
> I know it's an old version, but I didn't find any related bugs on "File
> not found" exceptions.
>
> Thanks!
>
>
>

Re: SSTableLoader Question

2018-02-18 Thread Rahul Singh

Check permissions maybe? Who owns the files vs. who is running sstableloader.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Feb 18, 2018, 4:26 AM -0500, shalom sagges , wrote:
> Hi All,
>
> C* version 2.0.14.
>
> I was loading some data to another cluster using SSTableLoader. The streaming 
> failed with the following error:
>
>
> Streaming error occurred
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or 
> directory)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:59)
>     at 
> org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1409)
>     at 
> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:55)
>     at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
>     at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>     at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>     at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>     at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)
>     at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.FileNotFoundException: 
> /data1/keyspace1/table1/keyspace1-table1-jb-65174-Data.db (No such file or 
> directory)
>     at java.io.RandomAccessFile.open(Native Method)
>     at java.io.RandomAccessFile.(RandomAccessFile.java:233)
>     at 
> org.apache.cassandra.io.util.RandomAccessReader.(RandomAccessReader.java:58)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.(CompressedRandomAccessReader.java:76)
>     at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:55)
>     ... 8 more
>  WARN 18:31:35,938 [Stream #7243efb0-1262-11e8-8562-d19d5fe7829c] Stream 
> failed
>
>
>
> Did I miss something when running the load? Was the file suddenly missing due 
> to compaction?
> If so, did I need to disable auto compaction or stop the service beforehand? 
> (didn't find any reference to compaction in the docs)
>
> I know it's an old version, but I didn't find any related bugs on "File not 
> found" exceptions.
>
> Thanks!
>
>

Re: sstableloader making no progress

2017-02-14 Thread Simone Franzini

Adding to the above, each host shows the following log messages that,
despite being at INFO level, appear like stack traces to me:

2017-02-13 15:09:22,166 INFO  [STREAM-INIT-/10.128.X.Y:60306]
 StreamResultFuture.java:116 - [Stream
#afe548d0-f230-11e6-bc5d-8f99f25bfcf7, ID#0] Received streaming plan for
Bulk Load
at clojure.lang.Var.invoke(Var.java:401)
at
opsagent.config_service$update_system$fn__20140.invoke(config_service.clj:205)
at clojure.core$reduce.invoke(core.clj:6518)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at
opsagent.config_service$fn__20217$fn__20218$state_machine__4128__auto20219$fn__20221.invoke(config_service.clj:250)
at
clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:940)
at clojure.core.async$ioc_alts_BANG_$fn__4293.invoke(async.clj:362)
at
clojure.core.async.impl.channels.ManyToManyChannel$fn__624.invoke(channels.clj:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.lang.Thread.run(Thread.java:745)

2017-02-13 15:09:22,208 INFO  [STREAM-IN-/10.128.X.Y]
 StreamResultFuture.java:166 - [Stream
#afe548d0-f230-11e6-bc5d-8f99f25bfcf7 ID#0] Prepare completed. Receiving 3
files(3963 bytes), sending 0 files(0 bytes)
at clojure.lang.ArraySeq.reduce(ArraySeq.java:114)
at opsagent.config_service$update_system.doInvoke(config_service.clj:199)
at opsagent.config_service$start_system_BANG_.invoke(config_service.clj:224)
at
opsagent.config_service$fn__20217$fn__20218$state_machine__4128__auto20219.invoke(config_service.clj:247)
at
clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:944)
at clojure.core.async$do_alts$fn__4247$fn__4250.invoke(async.clj:231)
at clojure.lang.AFn.run(AFn.java:22)

Simone Franzini, PhD

http://www.linkedin.com/in/simonefranzini

On Fri, Feb 10, 2017 at 4:28 PM, Simone Franzini 
wrote:

> I am trying to ingest some data from a cluster to a different cluster via
> sstableloader. I am running DSE 4.8.7 / Cassandra 2.1.14.
> I have re-created the schemas and followed other instructions here:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/
> toolsBulkloader_t.html
>
> I am initially testing the ingest process with a single table, containing
> 3 really small sstables (just a few KB each):
> sstableloader -v -d  /
> From the console, it appears that the progress quickly reaches 100%, but
> the command never returns:
> progress: [/10.128.X.Y]0:3/3 100% [/10.192.Z.W]0:3/3 100% ... total: 100%
> 0  MB/s(avg: 0 MB/s)
>
> nodetool netstats shows that there is no progress:
> Mode: NORMAL
> Bulk Load e495cea0-efde-11e6-9ec0-8f99f25bfcf7
> /10.128.X.Y
> Receiving 3 files, 3963 bytes total. Already received 0 files, 0
> bytes total
> Bulk Load b2566980-efb7-11e6-a467-8f99f25bfcf7
> /10.128.X.Y
> Receiving 3 files, 3963 bytes total. Already received 0 files, 0
> bytes total
> Bulk Load f31e7810-efdd-11e6-8484-8f99f25bfcf7
> /10.128.X.Y
> Receiving 3 files, 3963 bytes total. Already received 0 files, 0
> bytes total
> ...
> Read Repair Statistics:
> Attempted: 8
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool NameActive   Pending  Completed
> Commandsn/a 02148112
> Responses   n/a 0 977176
>
>
> The logs show the following, but no error or warning message:
> 2017-02-10 16:18:49,096 INFO  [STREAM-INIT-/10.128.X.Y:33302]
>  StreamResultFuture.java:109 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7
> ID#0] Creating new streaming plan for Bulk Load
> 2017-02-10 16:18:49,105 INFO  [STREAM-INIT-/10.128.X.Y:33302]
>  StreamResultFuture.java:116 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7,
> ID#0] Received streaming plan for Bulk Load
> 2017-02-10 16:18:49,110 INFO  [STREAM-INIT-/10.128.X.Y:33306]
>  StreamResultFuture.java:116 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7,
> ID#0] Received streaming plan for Bulk Load
> 2017-02-10 16:18:49,110 INFO  [STREAM-IN-/10.128.X.Y]
>  StreamResultFuture.java:166 - [Stream #e495cea0-efde-11e6-9ec0-8f99f25bfcf7
> ID#0] Prepare completed. Receiving 3 files(3963 bytes), sending 0 files(0
> bytes)
>
>
> Any help would be greatly appreciated.
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>

Re: [Marketing Mail] Re: [Marketing Mail] Re: sstableloader question

2016-10-12 Thread Osman YOZGATLIOGLU

Hello,

It's about 2500 sstables worth 25TB of data.
-t parameter doesn't change -t 1000 and -t 1
Most probably I face some limitation at target cluster.
I'm preparing to split sstables and run up to ten parallel sstableloader 
sessions.

Regards,
Osman

On 11-10-2016 21:46, Rajath Subramanyam wrote:
How many sstables are you trying to load ? Running sstableloaders in parallel 
will help. Did you try setting the "-t" parameter and see if you are getting 
the expected throughput ?

- Rajath

Rajath Subramanyam

On Mon, Oct 10, 2016 at 2:02 PM, Osman YOZGATLIOGLU 
> wrote:
Hello,

Thank you Adam and Rajath.

I'll split input sstables and run parallel jobs for each.
I tested this approach and run 3 parallel sstableloader job without -t 
parameter.
I raised stream_throughput_outbound_megabits_per_sec parameter from 200 to 600 
Mbit/sec at all of target nodes.
But each job runs about 10MB/sec only and generates about 100Mbit'sec network 
traffic.
At total this can be much more. Source and target servers has plenty of unused 
cpu, io and network resource.
Do you have any idea how can I increase speed of sstableloader job?

Regards,
Osman

On 10-10-2016 22:05, Rajath Subramanyam wrote:
Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You can 
restart the sstableloader job itself. Compaction will eventually take care of 
the redundant rows.

- Rajath

Rajath Subramanyam

On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson 
>>
 wrote:
It'll start over from the beginning.

On Sunday, October 9, 2016, Osman YOZGATLIOGLU 
>>
 wrote:
Hello,

I have running a sstableloader job.
Unfortunately some of nodes restarted since beginnig streaming.
I see streaming stop for those nodes.
Can I restart those streaming somehow?
Or if I restart sstableloader job, will it start from beginning?

Regards,
Osman

This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.

--

Adam Hutson
Data Architect | DataScale
+1 (417) 
224-5212
a...@datascale.io>

This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.

This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.

Re: [Marketing Mail] Re: sstableloader question

2016-10-11 Thread Rajath Subramanyam

How many sstables are you trying to load ? Running sstableloaders in
parallel will help. Did you try setting the "-t" parameter and see if you
are getting the expected throughput ?

- Rajath


Rajath Subramanyam


On Mon, Oct 10, 2016 at 2:02 PM, Osman YOZGATLIOGLU <
osman.yozgatlio...@krontech.com> wrote:

> Hello,
>
> Thank you Adam and Rajath.
>
> I'll split input sstables and run parallel jobs for each.
> I tested this approach and run 3 parallel sstableloader job without -t
> parameter.
> I raised stream_throughput_outbound_megabits_per_sec parameter from 200
> to 600 Mbit/sec at all of target nodes.
> But each job runs about 10MB/sec only and generates about 100Mbit'sec
> network traffic.
> At total this can be much more. Source and target servers has plenty of
> unused cpu, io and network resource.
> Do you have any idea how can I increase speed of sstableloader job?
>
> Regards,
> Osman
>
> On 10-10-2016 22:05, Rajath Subramanyam wrote:
> Hi Osman,
>
> You cannot restart the streaming only to the failed nodes specifically.
> You can restart the sstableloader job itself. Compaction will eventually
> take care of the redundant rows.
>
> - Rajath
>
> 
> Rajath Subramanyam
>
>
> On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson  @datascale.io>> wrote:
> It'll start over from the beginning.
>
>
> On Sunday, October 9, 2016, Osman YOZGATLIOGLU <
> osman.yozgatlio...@krontech.com>
> wrote:
> Hello,
>
> I have running a sstableloader job.
> Unfortunately some of nodes restarted since beginnig streaming.
> I see streaming stop for those nodes.
> Can I restart those streaming somehow?
> Or if I restart sstableloader job, will it start from beginning?
>
> Regards,
> Osman
>
>
> This e-mail message, including any attachments, is for the sole use of the
> person to whom it has been sent, and may contain information that is
> confidential or legally protected. If you are not the intended recipient or
> have received this message in error, you are not authorized to copy,
> distribute, or otherwise use this message or its attachments. Please notify
> the sender immediately by return e-mail and permanently delete this message
> and any attachments. KRON makes no warranty that this e-mail is error or
> virus free.
>
>
> --
>
> Adam Hutson
> Data Architect | DataScale
> +1 (417) 224-5212
> a...@datascale.io
>
>
>
>
> This e-mail message, including any attachments, is for the sole use of the
> person to whom it has been sent, and may contain information that is
> confidential or legally protected. If you are not the intended recipient or
> have received this message in error, you are not authorized to copy,
> distribute, or otherwise use this message or its attachments. Please notify
> the sender immediately by return e-mail and permanently delete this message
> and any attachments. KRON makes no warranty that this e-mail is error or
> virus free.
>

Re: [Marketing Mail] Re: sstableloader question

2016-10-10 Thread Osman YOZGATLIOGLU

Hello,

Thank you Adam and Rajath.

I'll split input sstables and run parallel jobs for each.
I tested this approach and run 3 parallel sstableloader job without -t 
parameter.
I raised stream_throughput_outbound_megabits_per_sec parameter from 200 to 600 
Mbit/sec at all of target nodes.
But each job runs about 10MB/sec only and generates about 100Mbit'sec network 
traffic.
At total this can be much more. Source and target servers has plenty of unused 
cpu, io and network resource.
Do you have any idea how can I increase speed of sstableloader job?

Regards,
Osman

On 10-10-2016 22:05, Rajath Subramanyam wrote:
Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You can 
restart the sstableloader job itself. Compaction will eventually take care of 
the redundant rows.

- Rajath

Rajath Subramanyam

On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson 
> wrote:
It'll start over from the beginning.

On Sunday, October 9, 2016, Osman YOZGATLIOGLU 
> wrote:
Hello,

I have running a sstableloader job.
Unfortunately some of nodes restarted since beginnig streaming.
I see streaming stop for those nodes.
Can I restart those streaming somehow?
Or if I restart sstableloader job, will it start from beginning?

Regards,
Osman

This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.

--

Adam Hutson
Data Architect | DataScale
+1 (417) 224-5212
a...@datascale.io

This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.

Re: sstableloader question

2016-10-10 Thread Rajath Subramanyam

Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You
can restart the sstableloader job itself. Compaction will eventually take
care of the redundant rows.

- Rajath


Rajath Subramanyam


On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson  wrote:

> It'll start over from the beginning.
>
>
> On Sunday, October 9, 2016, Osman YOZGATLIOGLU <
> osman.yozgatlio...@krontech.com> wrote:
>
>> Hello,
>>
>> I have running a sstableloader job.
>> Unfortunately some of nodes restarted since beginnig streaming.
>> I see streaming stop for those nodes.
>> Can I restart those streaming somehow?
>> Or if I restart sstableloader job, will it start from beginning?
>>
>> Regards,
>> Osman
>>
>>
>> This e-mail message, including any attachments, is for the sole use of
>> the person to whom it has been sent, and may contain information that is
>> confidential or legally protected. If you are not the intended recipient or
>> have received this message in error, you are not authorized to copy,
>> distribute, or otherwise use this message or its attachments. Please notify
>> the sender immediately by return e-mail and permanently delete this message
>> and any attachments. KRON makes no warranty that this e-mail is error or
>> virus free.
>>
>
>
> --
>
> Adam Hutson
> Data Architect | DataScale
> +1 (417) 224-5212
> a...@datascale.io
>

Re: sstableloader question

2016-10-09 Thread Adam Hutson

It'll start over from the beginning.

On Sunday, October 9, 2016, Osman YOZGATLIOGLU <
osman.yozgatlio...@krontech.com> wrote:

> Hello,
>
> I have running a sstableloader job.
> Unfortunately some of nodes restarted since beginnig streaming.
> I see streaming stop for those nodes.
> Can I restart those streaming somehow?
> Or if I restart sstableloader job, will it start from beginning?
>
> Regards,
> Osman
>
>
> This e-mail message, including any attachments, is for the sole use of the
> person to whom it has been sent, and may contain information that is
> confidential or legally protected. If you are not the intended recipient or
> have received this message in error, you are not authorized to copy,
> distribute, or otherwise use this message or its attachments. Please notify
> the sender immediately by return e-mail and permanently delete this message
> and any attachments. KRON makes no warranty that this e-mail is error or
> virus free.
>


-- 

Adam Hutson
Data Architect | DataScale
+1 (417) 224-5212
a...@datascale.io

Re: sstableloader

2016-08-17 Thread Jean Tremblay

Thank you for your answer Kai.

On 17 Aug 2016, at 11:34 , Kai Wang > 
wrote:

yes, you are correct.

On Tue, Aug 16, 2016 at 2:37 PM, Jean Tremblay 
> 
wrote:
Hi,

I’m using Cassandra 3.7.

In the documentation for sstableloader I read the following:

<< Note: To get the best throughput from SSTable loading, you can use multiple 
instances of sstableloader to stream across multiple machines. No hard limit 
exists on the number of SSTables that sstableloader can run at the same time, 
so you can add additional loaders until you see no further improvement.>>

Does this mean that I can stream my sstables to my cluster from many instance 
of sstableloader running simultaneously on many client machines?

I ask because I would like to improve the transfer speed of my stables to my 
cluster.

Kind regards and thanks for your comments.

Jean

Re: sstableloader

2016-08-17 Thread Kai Wang

yes, you are correct.

On Tue, Aug 16, 2016 at 2:37 PM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

> Hi,
>
> I’m using Cassandra 3.7.
>
> In the documentation for sstableloader I read the following:
>
> << Note: To get the best throughput from SSTable loading, you can use
> multiple instances of sstableloader to stream across multiple machines. No
> hard limit exists on the number of SSTables that sstableloader can run at
> the same time, so you can add additional loaders until you see no further
> improvement.>>
>
> Does this mean that I can stream my sstables to my cluster from many
> instance of sstableloader running simultaneously on many client machines?
>
> I ask because I would like to improve the transfer speed of my stables to
> my cluster.
>
> Kind regards and thanks for your comments.
>
> Jean
>

Re: sstableloader: Stream failed

2016-05-24 Thread Ralf Steppacher

Thanks for the hint! Indeed I could not telnet to the host. It was the 
listen_address that was not properly configured.

Thanks again!
Ralf


> On 23.05.2016, at 21:01, Paulo Motta  wrote:
> 
> Can you telnet 10.211.55.8 7000? This is the port used for streaming 
> communication with the destination node.
> 
> If not you should check what is the configured storage_port in the 
> destination node and set that in the cassandra.yaml of the source node so 
> it's picked up by sstableloader.
>

Re: sstableloader: Stream failed

2016-05-23 Thread Paulo Motta

Can you telnet 10.211.55.8 7000? This is the port used for streaming
communication with the destination node.

If not you should check what is the configured storage_port in the
destination node and set that in the cassandra.yaml of the source node so
it's picked up by sstableloader.

2016-05-23 10:48 GMT-03:00 Ralf Steppacher :

> Hello,
>
> I am trying to load the SSTables (from a Titan graph keyspace) of a
> one-node-cluster (C* v2.2.6) into another node, but I cannot figure out how
> to properly use the sstableloader. The target keyspace and table exist in
> the target node. If they do not exist I get a proper error message telling
> me so.
> Providing a cassandra.yaml or not makes no difference.
> The listen_address and rpc_address values in the cassandra.yaml, if
> provided, do not seem to matter (at least the error is always the same).
> Running sstableloader on the C* node itself or another host makes no
> difference.
> Truncating all tables before attempting to load the date makes no
> difference.
>
> The node is up and running:
> INFO  13:41:18 Starting listening for CQL clients on /10.211.55.8:9042...
> INFO  13:41:18 Binding thrift service to /10.211.55.8:9160
> INFO  13:41:18 Listening for thrift clients...
>
>
> The error I am getting is this:
>
> $ ./sstableloader -d 10.211.55.8 -f ../conf/cassandra.yaml -v ~/Downloads/
>
> ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/
> objc[18941]: Class JavaLaunchHelper is implemented in both
> /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/bin/java
> and
> /Library/Java/JavaVirtualMachines/jdk1.8.0_77.jdk/Contents/Home/jre/lib/libinstrument.dylib.
> One of the two will be used. Which one is undefined.
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of
> /Users/rsteppac/Downloads/ams0002-cassandra-20160523-1035/var/lib/cassandra/data/Titan/edgestore-8bcd2300d0d011e5a3ab233f92747e94/la-1-big-Data.db
> to [/10.211.55.8]
> ERROR 12:57:24 [Stream #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Streaming
> error occurred
> java.net.ConnectException: Connection refused
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_77]
> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_77]
> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_77]
> at
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
> ~[na:1.8.0_77]
> at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
> ~[na:1.8.0_77]
> at
> org.apache.cassandra.tools.BulkLoadConnectionFactory.createConnection(BulkLoadConnectionFactory.java:60)
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at
> org.apache.cassandra.streaming.StreamSession.createConnection(StreamSession.java:248)
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at
> org.apache.cassandra.streaming.ConnectionHandler.initiate(ConnectionHandler.java:83)
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at
> org.apache.cassandra.streaming.StreamSession.start(StreamSession.java:235)
> ~[apache-cassandra-2.2.6.jar:2.2.6]
> at
> org.apache.cassandra.streaming.StreamCoordinator$StreamSessionConnector.run(StreamCoordinator.java:212)
> [apache-cassandra-2.2.6.jar:2.2.6]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_77]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_77]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> progress: total: 100% 0  MB/s(avg: 0 MB/s)WARN  12:57:24 [Stream
> #e4b9cbc0-20e5-11e6-a00f-4b867a050904] Stream failed
> Streaming to the following hosts failed:
> [/10.211.55.8]
> java.util.concurrent.ExecutionException:
> org.apache.cassandra.streaming.StreamException: Stream failed
> at
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
> at
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
> at
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
> at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:115)
> Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
> at
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
> at
> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
> at
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
> at
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> at
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> at
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
> at
>

Re: sstableloader throughput

2016-01-11 Thread Noorul Islam Kamal Malmiyoda

On Mon, Jan 11, 2016 at 10:25 PM, Jeff Jirsa  wrote:
>
> Make sure streaming throughput isn’t throttled on the destination cluster.
>


How do I do that? Is stream_throughput_outbound_megabits_per_sec the
attribute in cassandra.yaml.

I think we can set that on the fly using nodetool setstreamthroughput

I ran

nodetool setstreamthroughput 0

on target machine. But that doesn't improve the average througput.

Thanks and Regards
Noorul

> Stream from more machines (divide sstables between a bunch of machines, run 
> in parallel).
>
>
>
>
>
>
>
> On 1/11/16, 5:21 AM, "Noorul Islam K M"  wrote:
>
>>
>>I have a need to stream data to new cluster using sstableloader. I
>>spawned a machine with 32 cores assuming that sstableloader scaled with
>>respect to cores. But it doesn't look like so.
>>
>>I am getting an average throughput of 18 MB/s which seems to be pretty
>>low (I might be wrong).
>>
>>Is there any way to increase the throughput. OpsCenter data on target
>>cluster shows very less write requests / second.
>>
>>Thanks and Regards
>>Noorul

Re: sstableloader throughput

2016-01-11 Thread Jeff Jirsa


Make sure streaming throughput isn’t throttled on the destination cluster. 

Stream from more machines (divide sstables between a bunch of machines, run in 
parallel).







On 1/11/16, 5:21 AM, "Noorul Islam K M"  wrote:

>
>I have a need to stream data to new cluster using sstableloader. I
>spawned a machine with 32 cores assuming that sstableloader scaled with
>respect to cores. But it doesn't look like so.
>
>I am getting an average throughput of 18 MB/s which seems to be pretty
>low (I might be wrong).
>
>Is there any way to increase the throughput. OpsCenter data on target
>cluster shows very less write requests / second.
>
>Thanks and Regards
>Noorul

smime.p7s
Description: S/MIME cryptographic signature

Re: sstableloader in version 2.0.14 doesn't honor thrift_framed_transport_size_in_mb set on server side

2015-08-14 Thread Prem Yadav

We had this issue when using hive on cassandra.
We had to replace the thrift jar with our own patches.

On Fri, Aug 14, 2015 at 5:27 PM, K F kf200...@yahoo.com wrote:

 While using sstableloader in 2.0.14 we have discovered that setting
 the thrift_framed_transport_size_in_mb to 16 in cassandra.yaml doesn't
 honor it. Did anybody see similar issue?

 So, this is the exception seen,

 org.apache.thrift.transport.TTransportException: Frame size (16165888)
 larger than max length (15728640)!
 java.lang.RuntimeException: Could not retrieve endpoint ranges:
 at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:282)
 at
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
 at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:94)
 Caused by: org.apache.thrift.transport.TTransportException: Frame size
 (16165888) larger than max length (15728640)!
 at
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
 at
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
 at
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
 at
 org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_describe_ring(Cassandra.java:1251)
 at
 org.apache.cassandra.thrift.Cassandra$Client.describe_ring(Cassandra.java:1238)
 at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:258)
 ... 2 more

 On Server side, it's the following.

 2015-08-14 15:10:10,637 [main] INFO ThriftServer Using TFramedTransport
 with a max frame size of 16777216 bytes.

Re: sstableloader Could not retrieve endpoint ranges

2015-06-26 Thread Mitch Gitman

I want to follow up on this thread to describe what I was able to get
working. My goal was to switch a cluster to vnodes, in the process
preserving the data for a single table, endpoints.endpoint_messages.
Otherwise, I could afford to start from a clean slate. As should be
apparent, I could also afford to do this within a maintenance window where
the cluster was down. In other words, I had the luxury of not having to add
a new data center to a live cluster per DataStax's documented procedure to
enable vnodes:
http://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configVnodesProduction_t.html
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configVnodesProduction_t.html

What I got working relies on the nodetool snapshot command to create
various SSTable snapshots under
endpoints/endpoint_messages/snapshots/SNAPSHOT_NAME. The snapshots
represent the data being backed up and restored from. The backup and
restore is not directly, literally working against the original SSTables
directly in various endpoints/endpoint_messages/ directories.

- endpoints/endpoint_messages/snapshots/SNAPSHOT_NAME/: These SSTables
are being copied off and restored from.
- endpoints/endpoint_messages/: These SSTables are obviously the source
of the snapshots but are not being copied off and restored from.

Instead of using sstableloader to load the snapshots into the
re-initialized Cassandra cluster, I used the JMX StorageService.bulkLoad
command after establishing a JConsole session to each node. I copied off
the snapshots to load to a directory path that ends with
endpoints/endpoint_messages/ to give the bulk-loader a path it expects. The
directory path that is the destination for nodetool snapshot and the source
for StorageService.bulkLoad is on the same host as the Cassandra node but
outside the purview of the Cassandra node.

This procedure can be summarized as follows:
1. For each node, create a snapshot of the endpoint_messages table as a
backup.
2. Stop the cluster.
3. On each node, wipe all the data, i.e. the contents of
data_files_directories, commitlog, and saved_caches.
4. Deploy the cassandra.yaml configuration that makes the switch to vnodes
and restart the cluster to apply the vnodes change.
5. Re-create the endpoints keyspace.
6. On each node, bulk-load the snapshots for that particular node.

This summary can be reduced even further:
1. On each node, export the data to preserve.
2. On each node, wipe the data.
3. On all nodes, switch to vnodes.
4. On each node, import back in the exported data.

I'm sure this process could have been streamlined.

One caveat for anyone looking to emulate this: Our situation might have
been a little easier to reason about because our original endpoint_messages
table had a replication factor of 1. We used the vnodes switch as an
opportunity to up the RF to 3.

I can only speculate as to why what I was originally attempting wasn't
working. But what I was originally attempting wasn't precisely the use case
I care about. What I'm following up with now was.

On Fri, Jun 19, 2015 at 8:22 PM, Mitch Gitman mgit...@gmail.com wrote:

I checked the system.log for the Cassandra node that I did the jconsole
JMX session against and which had the data to load. Lot of log output
indicating that it's busy loading the files. Lot of stacktraces indicating
a broken pipe. I have no reason to believe there are connectivity issues
between the nodes, but verifying that is beyond my expertise. What's
indicative is this last bit of log output:
INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,441
StreamReplyVerbHandler.java (line 44) Successfully sent
/srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-34-Data.db
to /10.205.55.101
INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,457
OutputHandler.java (line 42) Streaming session to /10.205.55.101 failed
ERROR [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,458
CassandraDaemon.java (line 253) Exception in thread Thread[Streaming to /
10.205.55.101:5,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Broken pipe
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:433)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565)
at
org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
at
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at

Re: sstableloader Could not retrieve endpoint ranges

2015-06-19 Thread Mitch Gitman

Fabien, thanks for the reply. We do have Thrift enabled. From what I can
tell, the Could not retrieve endpoint ranges: crops up under various
circumstances.

From further reading on sstableloader, it occurred to me that it might be a
safer bet to use the JMX StorageService bulkLoad command, considering that
the data to import was already on one of the Cassandra nodes, just in an
arbitrary directory outside the Cassandra data directories.

I was able to get this bulkLoad command to fail with a message that the
directory structure did not follow the expected keyspace/table/ pattern. So
I created a keyspace directory and then a table directory within that and
moved all the files under the table directory. Executed bulkLoad, passing
in that directory. It succeeded.

Then I went and ran a nodetool refresh on the table in question.

Only one problem. If I then went to query the table for, well, anything,
nothing came back. And this was after successfully querying the table
before and truncating the table just prior to the bulkLoad, so that I knew
that only the data coming from the bulkLoad could show up there.

Oh, and for good measure, I stopped and started all the nodes too. No luck
still.

What's puzzling about this is that the bulkLoad silently succeeds, even
though it doesn't appear to be doing anything. I haven't bothered yet to
check the Cassandra logs.

On Fri, Jun 19, 2015 at 12:28 AM, Fabien Rousseau fabifab...@gmail.com
wrote:

 Hi,

 I already got this error on a 2.1 clusters because thrift was disabled. So
 you should check that thrift is enabled and accessible from the
 sstableloader process.

 Hope this help

 Fabien
 Le 19 juin 2015 05:44, Mitch Gitman mgit...@gmail.com a écrit :

 I'm using sstableloader to bulk-load a table from one cluster to another.
 I can't just copy sstables because the clusters have different topologies.
 While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra
 1.2.19. The source data comes from a nodetool snapshot.

 Here's the command I ran:
 sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/*

 Here's the result I got:
 Could not retrieve endpoint ranges:
  -pr,--principal   kerberos principal
  -k,--keytab   keytab location
  --ssl-keystoressl keystore location
  --ssl-keystore-password   ssl keystore password
  --ssl-keystore-type   ssl keystore type
  --ssl-truststore  ssl truststore location
  --ssl-truststore-password ssl truststore password
  --ssl-truststore-type ssl truststore type

 Not sure what to make of this, what with the hints at security arguments
 that pop up. The source and destination clusters have no security.

 Hoping this might ring a bell with someone out there.

Re: sstableloader Could not retrieve endpoint ranges

2015-06-19 Thread Mitch Gitman

I checked the system.log for the Cassandra node that I did the jconsole JMX
session against and which had the data to load. Lot of log output
indicating that it's busy loading the files. Lot of stacktraces indicating
a broken pipe. I have no reason to believe there are connectivity issues
between the nodes, but verifying that is beyond my expertise. What's
indicative is this last bit of log output:
 INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,441
StreamReplyVerbHandler.java (line 44) Successfully sent
/srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-34-Data.db
to /10.205.55.101
 INFO [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,457
OutputHandler.java (line 42) Streaming session to /10.205.55.101 failed
ERROR [Streaming to /10.205.55.101:5] 2015-06-19 21:20:45,458
CassandraDaemon.java (line 253) Exception in thread Thread[Streaming to /
10.205.55.101:5,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Broken pipe
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:433)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:565)
at
org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
at
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 3 more

And then right after that I see what appears to be the output from the
nodetool refresh:
 INFO [RMI TCP Connection(2480)-10.2.101.114] 2015-06-19 21:22:56,877
ColumnFamilyStore.java (line 478) Loading new SSTables for
endpoints/endpoint_messages...
 INFO [RMI TCP Connection(2480)-10.2.101.114] 2015-06-19 21:22:56,878
ColumnFamilyStore.java (line 524) No new SSTables were found for
endpoints/endpoint_messages

Notice that Cassandra hasn't found any new SSTables, even though it was
just so busy loading them.

What's also noteworthy is that the output from the originating node shows
it successfully sent endpoints-endpoint_messages-ic-34-Data.db to another
node. But then in the system.log for that destination node, I see no
mention of that file. What I do see on the destination node are a few INFO
messages about streaming one of the .db files, and every time that's
immediately followed by an error message:
 INFO [Thread-108] 2015-06-19 21:20:45,453 StreamInSession.java (line 142)
Streaming of file
/srv/cas-snapshot-06-17-2015/endpoints/endpoint_messages/endpoints-endpoint_messages-ic-26-Data.db
sections=1 progress=0/105137329 - 0% for
org.apache.cassandra.streaming.StreamInSession@46c039ef failed: requesting
a retry.
ERROR [Thread-109] 2015-06-19 21:20:45,456 CassandraDaemon.java (line 253)
Exception in thread Thread[Thread-109,5,main]
java.lang.RuntimeException: java.nio.channels.AsynchronousCloseException
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.AsynchronousCloseException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:412)
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:203)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at
org.apache.cassandra.streaming.compress.CompressedInputStream$Reader.runMayThrow(CompressedInputStream.java:151)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 1 more

I don't know, I'm seeing enough flakiness here as to consider Cassandra
bulk-loading a lost cause, even if there is something wrong and fixable
about my particular cluster. On to exporting and re-importing data at the
proprietary application level. Life is too short.

On Fri, Jun 19, 2015 at 2:40 PM, Mitch Gitman mgit...@gmail.com wrote:

 Fabien, thanks for the reply. We do have Thrift enabled. From what I can
 tell, the Could not retrieve endpoint ranges: crops up under various
 circumstances.

 From further reading on sstableloader, it occurred to me that it might be
 a safer bet to use the JMX StorageService bulkLoad command, considering
 that the data to import was already on one of the Cassandra nodes, just in
 an arbitrary directory outside the Cassandra data directories.

 I was able to get this bulkLoad command to fail with a message that the
 directory structure did not follow the expected keyspace/table/ pattern. So
 I created

Re: sstableloader Could not retrieve endpoint ranges

2015-06-19 Thread Fabien Rousseau

Hi,

I already got this error on a 2.1 clusters because thrift was disabled. So
you should check that thrift is enabled and accessible from the
sstableloader process.

Hope this help

Fabien
Le 19 juin 2015 05:44, Mitch Gitman mgit...@gmail.com a écrit :

 I'm using sstableloader to bulk-load a table from one cluster to another.
 I can't just copy sstables because the clusters have different topologies.
 While we're looking to upgrade soon to Cassandra 2.0.x, we're on Cassandra
 1.2.19. The source data comes from a nodetool snapshot.

 Here's the command I ran:
 sstableloader -d *IP_ADDRESSES_OF_SEED_NOTES* */SNAPSHOT_DIRECTORY/*

 Here's the result I got:
 Could not retrieve endpoint ranges:
  -pr,--principal   kerberos principal
  -k,--keytab   keytab location
  --ssl-keystoressl keystore location
  --ssl-keystore-password   ssl keystore password
  --ssl-keystore-type   ssl keystore type
  --ssl-truststore  ssl truststore location
  --ssl-truststore-password ssl truststore password
  --ssl-truststore-type ssl truststore type

 Not sure what to make of this, what with the hints at security arguments
 that pop up. The source and destination clusters have no security.

 Hoping this might ring a bell with someone out there.

Re: sstableloader usage doubts

2015-06-09 Thread ZeroUno


Il 08/06/15 20:11, Robert Coli ha scritto:


On Mon, Jun 8, 2015 at 6:58 AM, ZeroUno zerozerouno...@gmail.com
mailto:zerozerouno...@gmail.com wrote:

So... if I stop the two nodes on the first DC, restore their
sstables' files, and then restart the nodes, nothing else needs to
be done on the first DC?

Be careful to avoid bootstrapping, but yes.


What do you mean?
As far as I read from the docs, bootstrapping happens when adding a new 
node to the cluster, but in my situation the nodes already exist, I'm 
only adding data back into them.


Also I have all 4 nodes configured as seeds in cassandra.yaml, so if I'm 
not wrong this should prevent them from auto-bootstrapping.


Thanks.

Marco

--
01

Re: sstableloader usage doubts

2015-06-09 Thread Robert Coli

On Tue, Jun 9, 2015 at 1:48 AM, ZeroUno zerozerouno...@gmail.com wrote:

 As far as I read from the docs, bootstrapping happens when adding a new
 node to the cluster, but in my situation the nodes already exist, I'm only
 adding data back into them.


If you don't have the contents of the system keyspace, there is a non-zero
chance of you bootstrapping in some cases.


 Also I have all 4 nodes configured as seeds in cassandra.yaml, so if I'm
 not wrong this should prevent them from auto-bootstrapping.


Yes.

=Rob

Re: sstableloader usage doubts

2015-06-08 Thread Robert Coli

On Mon, Jun 8, 2015 at 6:58 AM, ZeroUno zerozerouno...@gmail.com wrote:

 So you mean that refresh needs to be used if the cluster is running, but
 if I stopped cassandra while copying the sstables then refresh is useless?
 So the error No new SSTables were found during my refresh attempt is due
 to the fact that the sstables in my data dir were not new because already
 loaded, and not to the files not being found?


Yes. You should be able to see logs of it opening the files it finds in the
data dir.


 So... if I stop the two nodes on the first DC, restore their sstables'
 files, and then restart the nodes, nothing else needs to be done on the
 first DC?


Be careful to avoid bootstrapping, but yes.


 And on the second DC instead I just need to do nodetool rebuild --
 FirstDC on _both_ nodes?


Yes.

=Rob

Re: sstableloader usage doubts

2015-06-08 Thread ZeroUno


Il 05/06/15 22:40, Robert Coli ha scritto:


On Fri, Jun 5, 2015 at 7:53 AM, Sebastian Estevez
sebastian.este...@datastax.com mailto:sebastian.este...@datastax.com
wrote:

Since you only restored one dc's sstables, you should be able to
rebuild them on the second DC.

Refresh means pick up new SSTables that have been directly added to
the data directory.

Rebuild means stream data from other replicas to re create SSTables
from scratch.

Sebastian's response is correct; use rebuild. Sorry that I missed that
specific aspect of your question!


Thank you both.

So you mean that refresh needs to be used if the cluster is running, 
but if I stopped cassandra while copying the sstables then refresh is 
useless? So the error No new SSTables were found during my refresh 
attempt is due to the fact that the sstables in my data dir were not 
new because already loaded, and not to the files not being found?


So... if I stop the two nodes on the first DC, restore their sstables' 
files, and then restart the nodes, nothing else needs to be done on the 
first DC?


And on the second DC instead I just need to do nodetool rebuild -- 
FirstDC on _both_ nodes?


--
01

Re: sstableloader usage doubts

2015-06-05 Thread ZeroUno


Il 04/06/15 17:17, Sebastian Estevez ha scritto:


If you have all the sstables for each node and no token range changes,
you can just move the sstables to their spot in the data directory
(rsync or w/e) and bring up your nodes. If you're already up you can use
nodetool refresh to load the sstables.


Hi, as previously described, in my situation I have the sstables for 
only TWO of my four nodes, i.e. I have a backup of one datacenter only.


I tried stopping cassandra on all four nodes, copying the sstables to 
their original location on the two nodes for which I have a backup, and 
restarting cassandra on all four nodes, but the data did not propagate 
to the second datacenter: the two nodes were I restored the backup 
appeared to be OK, but the two nodes in the other datacenter remained empty.

Am I missing anything?

Also, I tried nodetool refresh with no success.
First of all, on which nodes should I run it?
I tried running it on the nodes were I restored the sstables, but it 
exited without any output and in the log I could see No new SSTables 
were found for mykeyspace/mytablename, it didn't do anything.
I'm pretty sure I restored the data in the right place, not in the 
snapshot subdirs.


Thanks.

--
01

Re: sstableloader usage doubts

2015-06-05 Thread ZeroUno


Il 04/06/15 19:50, Robert Coli ha scritto:


http://www.pythian.com/blog/bulk-loading-options-for-cassandra/


Thank you Rob, but actually it doesn't matter to me which method is 
used, I can use both nodetool refresh or sstableloader, as long as they 
work! ;-)


My problem here is that it looks like all my various attempts are 
failing, one way or another (see also my reply to Sebastian).


Marco.

--
01

Re: sstableloader usage doubts

2015-06-05 Thread Robert Coli

On Fri, Jun 5, 2015 at 7:53 AM, Sebastian Estevez 
sebastian.este...@datastax.com wrote:

 Since you only restored one dc's sstables, you should be able to rebuild
 them on the second DC.

 Refresh means pick up new SSTables that have been directly added to the
 data directory.

 Rebuild means stream data from other replicas to re create SSTables from
 scratch.


Sebastian's response is correct; use rebuild. Sorry that I missed that
specific aspect of your question!

=Rob

Re: sstableloader usage doubts

2015-06-05 Thread Sebastian Estevez

Since you only restored one dc's sstables, you should be able to rebuild
them on the second DC.

Refresh means pick up new SSTables that have been directly added to the
data directory.

Rebuild means stream data from other replicas to re create SSTables from
scratch.
On Jun 5, 2015 6:40 AM, ZeroUno zerozerouno...@gmail.com wrote:

 Il 04/06/15 19:50, Robert Coli ha scritto:

  http://www.pythian.com/blog/bulk-loading-options-for-cassandra/


 Thank you Rob, but actually it doesn't matter to me which method is used,
 I can use both nodetool refresh or sstableloader, as long as they work! ;-)

 My problem here is that it looks like all my various attempts are failing,
 one way or another (see also my reply to Sebastian).

 Marco.

 --
 01

Re: sstableloader usage doubts

2015-06-04 Thread Robert Coli

On Thu, Jun 4, 2015 at 5:39 AM, ZeroUno zerozerouno...@gmail.com wrote:

 while defining backup and restore procedures for a Cassandra cluster I'm
 trying to use sstableloader for restoring a snapshot from a backup, but I'm
 not sure I fully understand the documentation on how it should be used.


http://www.pythian.com/blog/bulk-loading-options-for-cassandra/

=Rob

Re: sstableloader usage doubts

2015-06-04 Thread Sebastian Estevez

You don't need sstable loader if your topology hasn't changed and you have
all your sstables backed up for each node. SStableloader actually streams
data to all the nodes in a ring (this is what OpsCenter backup restore
does). So you can actually restore to a larger or smaller cluster or a
cluster with different token ranges / vnodes vs. non vnodes etc. It also
requires all your nodes to be up.

If you have all the sstables for each node and no token range changes, you
can just move the sstables to their spot in the data directory (rsync or
w/e) and bring up your nodes. If you're already up you can use nodetool
refresh to load the sstables.

http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsRefresh.html


All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Jun 4, 2015 at 5:39 AM, ZeroUno zerozerouno...@gmail.com wrote:

 Hi,
 while defining backup and restore procedures for a Cassandra cluster I'm
 trying to use sstableloader for restoring a snapshot from a backup, but I'm
 not sure I fully understand the documentation on how it should be used.

 Looking at the examples in the doc at
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsBulkloader_t.html
 it seems like the path_to_keyspace to be passed as an argument is exactly
 the cassandra data directory. So, you already move the data in the final
 target location and then again stream it to the cluster?

 Let's do a step back. My cluster is composed of two data centers. Each
 data center has two nodes (nodeA1, nodeA2 for center A, nodeB1, nodeB2 for
 center B).
 I'm using NetworkTopologyStrategy with RF=2.

 For doing periodic backups I'm creating a snapshot on two nodes
 simultaneously in a single data center (nodeA1 and nodeA2), and then moving
 the snapshot files in a safe place.
 To simulate a disaster recovery situation, I truncate all tables to erase
 data (but not the schema which would be re-created anyway by my
 application), I stop cassandra on all 4 nodes, I move the snapshot backup
 files in their original locations (e.g.
 /mydatapath/cassandra/data/mykeyspace/mytable1/) on nodeA1 and nodeA2, then
 I restart cassandra on all 4 nodes.

 At last, I run:

  sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2
 /mydatapath/cassandra/data/mykeyspace/mytable1/
 sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2
 /mydatapath/cassandra/data/mykeyspace/mytable2/
 sstableloader -d nodeA1,nodeA2,nodeB1,nodeB2
 /mydatapath/cassandra/data/mykeyspace/mytable3/
 [...and so on for all tables]


 ...on both nodeA1 and nodeA2, where I restored the snapshot.

 Is that correct?

 I observed some strange behaviour after doing this: when I truncated
 tables again, a select count(*) on one of the A nodes still returned a
 non-zero number, as if data was still there.
 I started thinking that maybe the source sstable directory for
 sstableloader should not be the data directory itself, as this causes some
 kind if double data problem...

 Can anyone please tell me if this is the correct way to proceed?
 Thank you very much!

 --
 01

Re: sstableloader and ttls

2014-08-18 Thread Yuki Morishita

sstableloader just loads given SSTables as they are.
TTLed columns are sent and will be compacted at the destination node eventually.

On Sat, Aug 16, 2014 at 4:28 AM, Erik Forsberg forsb...@opera.com wrote:
 Hi!

 If I use sstableloader to load data to a cluster, and the source
 sstables contain some columns where the TTL has expired, i.e. the
 sstable has not yet been compacted - will those entries be properly
 removed on the destination side?

 Thanks,
 \EF



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)

Re: sstableloader prints nothing

2013-12-26 Thread Tyler Hobbs

On Wed, Dec 25, 2013 at 11:29 AM, Andrey Razumovsky 
razumovsky.and...@gmail.com wrote:

 OK, I  figured that out - turns out that my sstables were in directory
 keyspace_name but not in keyspace_name/family_name. Would be great to
 have a proper error message here..


I've opened a ticket to fix this:
https://issues.apache.org/jira/browse/CASSANDRA-6529


However, I still can't import the data. The exception I get on server now
looks like this:
 WARN [STREAM-IN-/127.0.1.1] 2013-12-25 18:20:09,686 StreamSession.java
(line 519) [Stream #4ec06a70-6d6e-11e3-85ae-9b0764b01181] Retrying for
following error
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
 at
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130)
at
org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103)
at
org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:255)
at
org.apache.cassandra.streaming.StreamReader.writeRow(StreamReader.java:134)
at
org.apache.cassandra.streaming.StreamReader.read(StreamReader.java:88)
at
org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:55)
at
org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:45)
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
at java.lang.Thread.run(Thread.java:724)

What is your schema for this table?

-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: SSTableloader

2013-12-26 Thread Robert Coli

On Thu, Dec 26, 2013 at 3:03 PM, varun allampalli
vshoori.off...@gmail.comwrote:

 I am trying to load using SSTableloader with cassandra 1.2 version like a
 million records. It streams very fast, but in the end its streaming gets
 stuck at two three machines in the cluster, rest all are 100% done.


The fragility of streaming is why streaming protocol was re-written for 2.0.


 Has anybody seen such a problem and is there any tool I can use to
 diagnose this loading.


You can use the --ignores functionality of sstableloader to restart and
ignore the two nodes which have already completed.

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

=Rob

Re: sstableloader prints nothing

2013-12-25 Thread Andrey Razumovsky

OK, I  figured that out - turns out that my sstables were in directory
keyspace_name but not in keyspace_name/family_name. Would be great to
have a proper error message here..

However, I still can't import the data. The exception I get on server now
looks like this:
 WARN [STREAM-IN-/127.0.1.1] 2013-12-25 18:20:09,686 StreamSession.java
(line 519) [Stream #4ec06a70-6d6e-11e3-85ae-9b0764b01181] Retrying for
following error
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:55)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:64)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:130)
at
org.apache.cassandra.io.sstable.ColumnNameHelper.minComponents(ColumnNameHelper.java:103)
at
org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:255)
at
org.apache.cassandra.streaming.StreamReader.writeRow(StreamReader.java:134)
at
org.apache.cassandra.streaming.StreamReader.read(StreamReader.java:88)
at
org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:55)
at
org.apache.cassandra.streaming.messages.FileMessage$1.deserialize(FileMessage.java:45)
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
at java.lang.Thread.run(Thread.java:724)


Googling didn't show me the way.. Could anyone help me with this please?

Thanks,
Andrey


2013/12/25 Andrey Razumovsky razumovsky.and...@gmail.com

 Hi everyone,

 I'm trying to use sstableloader to import large amounts of data into my
 cassandra-2.0.3 instance (single node). I've created sstables directory and
 not running
   sstableloader -d localhost path_to_tables

 but the process just starts and prints nothing at all! I've no idea
 whether it is doing anything - is there a way to see the output and
 progress? Or is the process just hanging?

 Thanks,
 Andrey

Re: sstableloader does not support client encryption on Cassandra 2.0?

2013-11-19 Thread Tyler Hobbs

I think this is just an oversight; would you mind opening a ticket here?
https://issues.apache.org/jira/browse/CASSANDRA


On Mon, Nov 18, 2013 at 12:37 PM, David Laube d...@stormpath.com wrote:

 Hi All,

 We have been testing backup/restore from one ring to another and we
 recently stumbled upon an issue with sstableloader. When client_enc_enable:
 true, the exception below is generated. When client_enc_enable is set to
 false, the sstableloader is able to get to the point where it is discovers
 endpoints, connects to stream data, etc.

 ==BEGIN EXCEPTION==
  sstableloader --debug -d x.x.x.248,x.x.x.108,x.x.x.113
 /tmp/import/keyspace_name/columnfamily_name
 Exception in thread main java.lang.RuntimeException: Could not retrieve
 endpoint ranges:
 at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:226)
 at
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
 at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:68)
 Caused by: org.apache.thrift.transport.TTransportException: Frame size
 (352518400) larger than max length (16384000)!
 at
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
 at
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1292)
 at
 org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1280)
 at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:199)
 ... 2 more
 ==END EXCEPTION==


 Has anyone seen this before or can someone confirm that SSL/encryption is
 not supported under the open source project and only with d-stax enterprise?

 Thanks,
 -David Laube




-- 
Tyler Hobbs
DataStax http://datastax.com/

Re: sstableloader does not support client encryption on Cassandra 2.0?

2013-11-19 Thread David Laube

Thank you Tyler. I took your advice and I have opened 
https://issues.apache.org/jira/browse/CASSANDRA-6378

Best regards,
-David Laube

On Nov 19, 2013, at 9:51 AM, Tyler Hobbs ty...@datastax.com wrote:

 I think this is just an oversight; would you mind opening a ticket here? 
 https://issues.apache.org/jira/browse/CASSANDRA
 
 
 On Mon, Nov 18, 2013 at 12:37 PM, David Laube d...@stormpath.com wrote:
 Hi All,
 
 We have been testing backup/restore from one ring to another and we recently 
 stumbled upon an issue with sstableloader. When client_enc_enable: true, the 
 exception below is generated. When client_enc_enable is set to false, the 
 sstableloader is able to get to the point where it is discovers endpoints, 
 connects to stream data, etc.
 
 ==BEGIN EXCEPTION==
  sstableloader --debug -d x.x.x.248,x.x.x.108,x.x.x.113 
 /tmp/import/keyspace_name/columnfamily_name
 Exception in thread main java.lang.RuntimeException: Could not retrieve 
 endpoint ranges:
 at 
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:226)
 at 
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
 at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:68)
 Caused by: org.apache.thrift.transport.TTransportException: Frame size 
 (352518400) larger than max length (16384000)!
 at 
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
 at 
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:362)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:284)
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:191)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_describe_partitioner(Cassandra.java:1292)
 at 
 org.apache.cassandra.thrift.Cassandra$Client.describe_partitioner(Cassandra.java:1280)
 at 
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:199)
 ... 2 more
 ==END EXCEPTION==
 
 
 Has anyone seen this before or can someone confirm that SSL/encryption is not 
 supported under the open source project and only with d-stax enterprise?
 
 Thanks,
 -David Laube
 
 
 
 -- 
 Tyler Hobbs
 DataStax

Re: sstableloader in Counter Columns

2013-11-11 Thread Aaron Morton

 After truncating and running sstableloader it was:
 
 [default@Sessions] get Counters[0];
 = (counter=EVENTS, value=4809758)
 = (counter=ITEMS, value=382473)
 = (counter=USERS, value=2571674)
The expected value was 395930 and you got 382473 instead ? 
Was there any errors / warnings in the log about counter shards ?



 Then, I've performed another test. I've truncated the CF, then incremented 
 the row 'ITEMS' by 1.000.000 and then run sstableloader. Unfortunately, the 
 result was:
 
 [default@Sessions] get Counters[0];
 = (counter=EVENTS, value=4809758)
 = (counter=ITEMS, value=382473)
 = (counter=USERS, value=2571674)

What was the expected value ? 

 The counter column family had only one row, so I chose one node and performed 
 nodetool snapshot. Then, I've truncated this CF.
The counter value is broken up into shards, each node that is a replica for the 
row maintains it’s own shard and replicates that to the other nodes. 
If your cluster was dropped messages and you took the snapshot from only one 
machine it’s possible you did not get a consistenct view of the data. 

 Is this the normal behaviour of sstableloader for counter column families? If 
 that is the case, we cannot run sstableloader in a live cluster, can we?
I would be interested to know if you get the same result when using snapshots 
from all nodes. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 8/11/2013, at 2:08 am, Francisco Nogueira Calmon Sobral 
fsob...@igcorp.com.br wrote:

 Hi, all!
 
 I've performed a test in my cluster, regarding the sstableloader behaviour on 
 counter column families. The test cluster has 3 nodes running Cassandra 1.2.3 
 with RF=3. The machine that sstableloader run had Cassandra 1.2.11.
 
 The counter column family had only one row, so I chose one node and performed 
 nodetool snapshot. Then, I've truncated this CF.
 
 Before truncating, this CF was:
 
 RowKey: 0
 = (counter=EVENTS, value=4816777)
 = (counter=ITEMS, value=395930)
 = (counter=USERS, value=2574764)
 
 After truncating and running sstableloader it was:
 
 [default@Sessions] get Counters[0];
 = (counter=EVENTS, value=4809758)
 = (counter=ITEMS, value=382473)
 = (counter=USERS, value=2571674)
 
 Then, I've performed another test. I've truncated the CF, then incremented 
 the row 'ITEMS' by 1.000.000 and then run sstableloader. Unfortunately, the 
 result was:
 
 [default@Sessions] get Counters[0];
 = (counter=EVENTS, value=4809758)
 = (counter=ITEMS, value=382473)
 = (counter=USERS, value=2571674)
 
 
 
 Is this the normal behaviour of sstableloader for counter column families? If 
 that is the case, we cannot run sstableloader in a live cluster, can we?
 
 Best regards!
 Francisco

RE: 'sstableloader' is not recognized as an internal or external command,

2013-04-23 Thread Viktor Jevdokimov

If your Cassandra cluster is on Linux, I believe that streaming is not 
supported in mixed environment, i.e. Cassandra nodes can't stream between 
Windows and Linux and sstableloader can't stream feom Windows to Linux.

If your Cassandra also on Windows, just try to create bat file for 
sstableloader using other bat files for example.
I don't know if sstableloader will support Windows directory structure.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Techy Teck [mailto:comptechge...@gmail.com]
Sent: Tuesday, April 23, 2013 09:10
To: user
Subject: 'sstableloader' is not recognized as an internal or external command,

I have bunch of `SSTables` with me that I got from somebody within my team. Now 
I was trying to push those `SSTABLES` into `Cassandra database`.

I created corresponding keyspace and column family successfully.

Now as soon as I execute `SSTableLoader` command, I always get below exception?


S:\Apache Cassandra\apache-cassandra-1.2.3\binsstableloader
C:\CassandraClient-LnP\20130405\profileks\PROFILECF 'sstableloader' is
not recognized as an internal or external command, operable program or
batch file.


Can anyone tell me what wrong I am doing here? I am running Cassandra 1.2.3. 
And this is my first time working with `SSTableLoader`. I am working in windows 
environment.

Is sstableloader supported in windows, looking at the source it seems to be 
unix shell file?
inline: signature-logo29.pnginline: signature-best-employer-logo4823.png

Re: 'sstableloader' is not recognized as an internal or external command,

2013-04-23 Thread aaron morton

 Is sstableloader supported in windows, looking at the source it seems to be 
 unix shell file?

Yup. 
If you would like to put together an sstableloader.bat file use the 
sstablekeys.bat file as a template but use 
org.apache.cassandra.tools.BulkLoader and the CASSANDRA_MAIN

If you can get it working please raise a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA and donate it back to Apache. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 23/04/2013, at 6:37 PM, Viktor Jevdokimov viktor.jevdoki...@adform.com 
wrote:

 If your Cassandra cluster is on Linux, I believe that streaming is not 
 supported in mixed environment, i.e. Cassandra nodes can’t stream between 
 Windows and Linux and sstableloader can’t stream feom Windows to Linux.
  
 If your Cassandra also on Windows, just try to create bat file for 
 sstableloader using other bat files for example.
 I don’t know if sstableloader will support Windows directory structure.
  
  
  
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
 
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 Take a ride with Adform's Rich Media Suite
 signature-logo29.png
 signature-best-employer-logo4823.png
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
 
 From: Techy Teck [mailto:comptechge...@gmail.com] 
 Sent: Tuesday, April 23, 2013 09:10
 To: user
 Subject: 'sstableloader' is not recognized as an internal or external command,
  
 I have bunch of `SSTables` with me that I got from somebody within my team. 
 Now I was trying to push those `SSTABLES` into `Cassandra database`.
 
 I created corresponding keyspace and column family successfully.
 
 Now as soon as I execute `SSTableLoader` command, I always get below 
 exception?
 
 
 S:\Apache Cassandra\apache-cassandra-1.2.3\binsstableloader
 C:\CassandraClient-LnP\20130405\profileks\PROFILECF 'sstableloader' is
 not recognized as an internal or external command, operable program or
 batch file.
 
 
 Can anyone tell me what wrong I am doing here? I am running Cassandra 1.2.3. 
 And this is my first time working with `SSTableLoader`. I am working in 
 windows environment.
 
 Is sstableloader supported in windows, looking at the source it seems to be 
 unix shell file?

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

RE: sstableloader throughput

2013-04-10 Thread Viktor Jevdokimov

Found https://issues.apache.org/jira/browse/CASSANDRA-3668
Weird.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
Sent: Wednesday, April 10, 2013 11:12
To: user@cassandra.apache.org
Subject: sstableloader throughput

Hi,

We're using Casandra 1.0.12 sstableloader to import data from dedicated machine 
located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1 and 16 
nodes in DC2.

To disable throttle for sstableloader we set in casasndra.yaml:
stream_throughput_outbound_megabits_per_sec: 0

Outgoing network throughput is about 1Gbit, file copy from dedicated 
sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into DC2 
node.

But with sstableloader outgoing network traffic is only 9MB/s, importing 1 
sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with 50MB/s 
import should take less than a minute.

Why sstableloader throughput is so low/slow?



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News]http://www.adform.com
[Adform awarded the Best Employer 
2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: image001.pnginline: image002.pnginline: signature-logo56e1.pnginline: signature-best-employer-logo63d8.png

Re: sstableloader throughput

2013-04-10 Thread Edward Capriolo

Stables loader was slow in 1:0:x I had better luck with rsync. It was not
fixed in the 1.0.x series.

On Wednesday, April 10, 2013, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:
 Found https://issues.apache.org/jira/browse/CASSANDRA-3668

 Weird.







 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 Take a ride with Adform's Rich Media Suite
 /mail/u/0/s/?view=attth=13df3179d16ab64aattid=0.3disp=embzwatsh=1
 /mail/u/0/s/?view=attth=13df3179d16ab64aattid=0.4disp=embzwatsh=1
 Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

 From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
 Sent: Wednesday, April 10, 2013 11:12
 To: user@cassandra.apache.org
 Subject: sstableloader throughput



 Hi,



 We’re using Casandra 1.0.12 sstableloader to import data from dedicated
machine located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1
and 16 nodes in DC2.



 To disable throttle for sstableloader we set in casasndra.yaml:

 stream_throughput_outbound_megabits_per_sec: 0



 Outgoing network throughput is about 1Gbit, file copy from dedicated
sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into
DC2 node.



 But with sstableloader outgoing network traffic is only 9MB/s, importing
1 sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with
50MB/s import should take less than a minute.



 Why sstableloader throughput is so low/slow?







 Best regards / Pagarbiai

 Viktor Jevdokimov

 Senior Developer



 Email: viktor.jevdoki...@adform.com

 Phone: +370 5 212 3063, Fax +370 5 261 0453

 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania

 Follow us on Twitter: @adforminsider

 Take a ride with Adform's Rich Media Suite


thismessage:/mail/u/0/s/?view=attth=13df3179d16ab64aattid=0.1disp=embzwatsh=1


thismessage:/mail/u/0/s/?view=attth=13df3179d16ab64aattid=0.2disp=embzwatsh=1

RE: sstableloader throughput

2013-04-10 Thread Viktor Jevdokimov

Rsync is not for our case.

Is sstableloader for 1.2.x faster?

From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Wednesday, April 10, 2013 15:52
To: user@cassandra.apache.org
Subject: Re: sstableloader throughput

Stables loader was slow in 1:0:x I had better luck with rsync. It was not fixed 
in the 1.0.x series.

On Wednesday, April 10, 2013, Viktor Jevdokimov viktor.jevdoki...@adform.com 
wrote:
 Found https://issues.apache.org/jira/browse/CASSANDRA-3668

 Weird.

 Hi,

 We're using Casandra 1.0.12 sstableloader to import data from dedicated 
 machine located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1 
 and 16 nodes in DC2.

 To disable throttle for sstableloader we set in casasndra.yaml:

 stream_throughput_outbound_megabits_per_sec: 0

 Outgoing network throughput is about 1Gbit, file copy from dedicated 
 sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into 
 DC2 node.

 But with sstableloader outgoing network traffic is only 9MB/s, importing 1 
 sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with 
 50MB/s import should take less than a minute.

 Why sstableloader throughput is so low/slow?

Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

Re: sstableloader throughput

2013-04-10 Thread Edward Capriolo

Yes. I did confirm that the 1.1 sstable loader works much better then the
1.0 version. The changes were not easy to backport to the 1.0.X branch so
it did not happen. It is likely that 1.2 is even better :)


On Wed, Apr 10, 2013 at 10:38 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

 Rsync is not for our case.

 Is sstableloader for 1.2.x faster?



 From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
 Sent: Wednesday, April 10, 2013 15:52
 To: user@cassandra.apache.org
 Subject: Re: sstableloader throughput

 Stables loader was slow in 1:0:x I had better luck with rsync. It was not
 fixed in the 1.0.x series.

 On Wednesday, April 10, 2013, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:
  Found https://issues.apache.org/jira/browse/CASSANDRA-3668
 
  Weird.
 
 
 
  Hi,
 
 
 
  We're using Casandra 1.0.12 sstableloader to import data from dedicated
 machine located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1
 and 16 nodes in DC2.
 
 
 
  To disable throttle for sstableloader we set in casasndra.yaml:
 
  stream_throughput_outbound_megabits_per_sec: 0
 
 
 
  Outgoing network throughput is about 1Gbit, file copy from dedicated
 sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into
 DC2 node.
 
 
 
  But with sstableloader outgoing network traffic is only 9MB/s, importing
 1 sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with
 50MB/s import should take less than a minute.
 
 
 
  Why sstableloader throughput is so low/slow?
 

 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-01112 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

RE: sstableloader throughput

2013-04-10 Thread Viktor Jevdokimov

Thanks, that could help to consider migration of the cluster to a newer 
version. We'll check the difference.

From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Wednesday, April 10, 2013 18:03
To: user@cassandra.apache.org
Subject: Re: sstableloader throughput

Yes. I did confirm that the 1.1 sstable loader works much better then the 1.0 
version. The changes were not easy to backport to the 1.0.X branch so it did 
not happen. It is likely that 1.2 is even better :)

On Wed, Apr 10, 2013 at 10:38 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:
Rsync is not for our case.

Is sstableloader for 1.2.x faster?

From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Wednesday, April 10, 2013 15:52
To: user@cassandra.apache.org
Subject: Re: sstableloader throughput

Stables loader was slow in 1:0:x I had better luck with rsync. It was not fixed 
in the 1.0.x series.

On Wednesday, April 10, 2013, Viktor Jevdokimov viktor.jevdoki...@adform.com 
wrote:
 Found https://issues.apache.org/jira/browse/CASSANDRA-3668

 Weird.

 Hi,

 We're using Casandra 1.0.12 sstableloader to import data from dedicated 
 machine located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1 
 and 16 nodes in DC2.

 To disable throttle for sstableloader we set in casasndra.yaml:

 stream_throughput_outbound_megabits_per_sec: 0

 Outgoing network throughput is about 1Gbit, file copy from dedicated 
 sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into 
 DC2 node.

 But with sstableloader outgoing network traffic is only 9MB/s, importing 1 
 sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with 
 50MB/s import should take less than a minute.

 Why sstableloader throughput is so low/slow?

Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

Re: sstableloader error

2012-08-28 Thread aaron morton

 WARN 21:41:15,200 Failed attempt 1 to connect to /10.245.28.232 to stream 
 null. Retrying in 2 ms. (java.net.ConnectException: Connection timed out)
If you let sstable run does it complete ?

 I am running cassandra on foreground. So, on all of the cassandra nodes i get 
 the below message:
  INFO 21:40:30,335 Node /192.168.11.11 is now part of the cluster
This is the bulk load process joining the ring to send the file around. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 28/08/2012, at 10:56 AM, Swathi Vikas swat.vi...@yahoo.com wrote:

 Hi,
  
 I had uploaded data using sstablelaoder to a single node cluster earlier 
 without any problem. Now, while trying to upload to 3 node cluster it is 
 giving me below error:
  
 localhost:~/apache-cassandra-1.0.7/sstableloader_folder # bin/sstableloader 
 DEMO/
 Starting client (and waiting 30 seconds for gossip) ...
 Streaming revelant part of DEMO/UMD-hc-1-Data.db to [/10.245.28.232, 
 /10.245.28.231, /10.245.28.230]
 progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230 0/0 (100)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 21:41:15,200 
 Failed attempt 1 to connect to /10.245.28.232 to stream null. Retrying in 
 2 ms. (java.net.ConnectException: Connection timed out)
 progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230progress: [/10.245.28.232 0/0 (100)] [/10.245.28.231 0/1 (0)] 
 [/10.245.28.230 0/0 (100)] [total: 0 - 0MB/s (avg: 
 0MB/s)]^Clocalhost:~/apache-cassandra-1.0.7/sstableloader_folder #
 I am running cassandra on foreground. So, on all of the cassandra nodes i get 
 the below message:
  INFO 21:40:30,335 Node /192.168.11.11 is now part of the cluster
  INFO 21:40:30,336 InetAddress /192.168.11.11 is now UP
  INFO 21:41:55,320 InetAddress /192.168.11.11 is now dead.
  INFO 21:41:55,321 FatClient /192.168.11.11 has been silent for 3ms, 
 removing from gossip
 I used ByteOrderPartitioner and filled intial token on all nodes.
 I have set seeds as 10.245.28.230,10.245.28.231
 I have properly set listen address, rpc_address(0.0.0.0) and ports
  
 One thing i noticed is that, when i try to connect to this cluster using 
 client(libQtCassandra) and try to create column family, all the nodes respond 
 and column family got created properly.
  
 Can anyone help me please.
  
 Thanks and Regards,
 Swat.vikas

RE: sstableloader 1.1 won't stream

2012-05-18 Thread Pieter Callewaert

Hi,

Sorry to say I didn't look further into this. I'm using CentOS 6.2 now for 
loader without any problems.

Kind regards,
Pieter Callewaert

-Original Message-
From: sj.climber [mailto:sj.clim...@gmail.com] 
Sent: vrijdag 18 mei 2012 3:56
To: cassandra-u...@incubator.apache.org
Subject: Re: sstableloader 1.1 won't stream

Pieter, Aaron,

Any further progress on this?  I'm running into the same issue, although in my 
case I'm trying to stream from Ubuntu 10.10 to a 2-node cluster (also Cassandra 
1.1.0, and running on separate Ubuntu 10.10 hosts).

Thanks in advance!

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/sstableloader-1-1-won-t-stream-tp7535517p7564811.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: sstableloader 1.1 won't stream

2012-05-17 Thread sj.climber

Pieter, Aaron,

Any further progress on this?  I'm running into the same issue, although in
my case I'm trying to stream from Ubuntu 10.10 to a 2-node cluster (also
Cassandra 1.1.0, and running on separate Ubuntu 10.10 hosts).

Thanks in advance!

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/sstableloader-1-1-won-t-stream-tp7535517p7564811.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

RE: sstableloader 1.1 won't stream

2012-05-10 Thread Pieter Callewaert

Firstly I disabled ipv6 on the server to be sure it wasn't trying to use the 
ipv6, but no effect.

I've tried using sstableloader on one of the cassandra nodes, no probem here, 
worked perfectly!
So was doubting if the first server was corrupt or something, so I tried on 
another server, CentOS 5.7 x64 with java 7 p04, which is not running a 
Cassandra instance, and again I'm having problems streaming:

[root@bms-web2 ~]# ./apache-cassandra-1.1.0/bin/sstableloader --debug -d 
10.10.10.100 MapData024/HOS/
Streaming revelant part of MapData024/HOS/MapData024-HOS-hc-1-Data.db to 
[/10.10.10.102, /10.10.10.100, /10.10.10.101]

progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
(0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 10:53:18,575 Failed attempt 1 to 
connect to /10.10.10.101 to stream MapData024/HOS/MapData024-HOS-hc-1-Data.db 
sections=2 progress=0/6566400 - 0%. Retrying in 4000 ms. 
(java.net.SocketException: Invalid argument or cannot assign requested address)
 WARN 10:53:18,577 Failed attempt 1 to connect to /10.10.10.102 to stream 
MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 progress=0/6557280 - 0%. 
Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
 WARN 10:53:18,594 Failed attempt 1 to connect to /10.10.10.100 to stream 
MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 progress=0/6551840 - 0%. 
Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
(0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 10:53:22,598 Failed attempt 2 to 
connect to /10.10.10.101 to stream MapData024/HOS/MapData024-HOS-hc-1-Data.db 
sections=2 progress=0/6566400 - 0%. Retrying in 8000 ms. 
(java.net.SocketException: Invalid argument or cannot assign requested address)
 WARN 10:53:22,601 Failed attempt 2 to connect to /10.10.10.102 to stream 
MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 progress=0/6557280 - 0%. 
Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
 WARN 10:53:22,611 Failed attempt 2 to connect to /10.10.10.100 to stream 
MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 progress=0/6551840 - 0%. 
Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
(0)] [total: 0 - 0MB/s (avg: 0MB/s)]
[root@bms-web2 ~]# java -version
java version 1.7.0_04
Java(TM) SE Runtime Environment (build 1.7.0_04-b20)
Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
[root@bms-web2 ~]# cat /etc/redhat-release
CentOS release 5.7 (Final)

Is it possible the sstableloader only works now if a Cassandra instance is also 
running on the same server? The only other difference I see is CentOS 6.2 vs 
CentOS 5.x
The new sstableloader, does it still use the Cassandra.yaml or is it completely 
independent? 

Kind regards

-Original Message-
From: Pieter Callewaert [mailto:pieter.callewa...@be-mobile.be] 
Sent: woensdag 9 mei 2012 17:41
To: user@cassandra.apache.org
Subject: RE: sstableloader 1.1 won't stream

I don't see any entries in the logs of the nodes.

I've disabled SELinux, to be sure this wasn't a blocking factor, and tried 
adding -Djava.net.preferIPv4Stack=true to bin/sstableloader, but no change 
unfortunately.

To summarize, I'm trying to use sstableloader from a server (CentOS release 5.8 
(Final)) not running Cassandra to a 3-node Cassandra cluster. All running 1.1.
My next step will be to try to use sstableloader on one of the nodes from the 
cluster, to see if that works...

If anyone has any other ideas, please share.

Kind regards,
Pieter Callewaert

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: woensdag 9 mei 2012 10:45
To: user@cassandra.apache.org
Subject: Re: sstableloader 1.1 won't stream

Have you checked for errors in the servers' logs?

--
Sylvain

On Tue, May 8, 2012 at 1:24 PM, Pieter Callewaert 
pieter.callewa...@be-mobile.be wrote:
 I've updated all nodes to 1.1 but I keep getting the same problem...
 Any other thoughts about this?

 Kind regards,
 Pieter

 -Original Message-
 From: Benoit Perroud [mailto:ben...@noisette.ch]
 Sent: maandag 7 mei 2012 22:21
 To: user@cassandra.apache.org
 Subject: Re: sstableloader 1.1 won't stream

 You may want to upgrade all your nodes to 1.1.

 The streaming process connect to every living nodes of the cluster (you can 
 explicitely diable some nodes), so all nodes need to speak 1.1.



 2012/5/7 Pieter Callewaert pieter.callewa...@be-mobile.be:
 Hi,



 I'm trying to upgrade our bulk load process in our testing env.

 We use the SSTableSimpleUnsortedWriter to write tables, and use 
 sstableloader to stream it into our cluster.

 I've changed the writer program to fit to the 1.1 api, but now I'm 
 having troubles to load

Re: sstableloader 1.1 won't stream

2012-05-10 Thread aaron morton

It looks like a networking thing. Is there anything interesting in the network 
config ? Are the nodes using a broadcast_address ? Can you telnet from the 
machine running sstableloader to port 7000 on 10.10.10.101 ?

I *think* sstableload will use the log4j-server.properties log config. Can you 
turn the logging up to debug and see what it days ? 
 
it looks like the sstableloader as connected to the cluster on the thrift / 
listen port 9160. It's then tried to run the transfer, which is when this 
happens….

 WARN 10:53:18,575 Failed attempt 1 to connect to /10.10.10.101 to stream 
 MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=2 progress=0/6566400 - 
 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or 
 cannot assign requested address)

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 10/05/2012, at 9:01 PM, Pieter Callewaert wrote:

 Firstly I disabled ipv6 on the server to be sure it wasn't trying to use the 
 ipv6, but no effect.
 
 I've tried using sstableloader on one of the cassandra nodes, no probem here, 
 worked perfectly!
 So was doubting if the first server was corrupt or something, so I tried on 
 another server, CentOS 5.7 x64 with java 7 p04, which is not running a 
 Cassandra instance, and again I'm having problems streaming:
 
 [root@bms-web2 ~]# ./apache-cassandra-1.1.0/bin/sstableloader --debug -d 
 10.10.10.100 MapData024/HOS/
 Streaming revelant part of MapData024/HOS/MapData024-HOS-hc-1-Data.db to 
 [/10.10.10.102, /10.10.10.100, /10.10.10.101]
 
 progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 10:53:18,575 Failed attempt 1 to 
 connect to /10.10.10.101 to stream MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=2 progress=0/6566400 - 0%. Retrying in 4000 ms. 
 (java.net.SocketException: Invalid argument or cannot assign requested 
 address)
 WARN 10:53:18,577 Failed attempt 1 to connect to /10.10.10.102 to stream 
 MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 progress=0/6557280 - 
 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or 
 cannot assign requested address)
 WARN 10:53:18,594 Failed attempt 1 to connect to /10.10.10.100 to stream 
 MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 progress=0/6551840 - 
 0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or 
 cannot assign requested address)
 progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 10:53:22,598 Failed attempt 2 to 
 connect to /10.10.10.101 to stream MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=2 progress=0/6566400 - 0%. Retrying in 8000 ms. 
 (java.net.SocketException: Invalid argument or cannot assign requested 
 address)
 WARN 10:53:22,601 Failed attempt 2 to connect to /10.10.10.102 to stream 
 MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 progress=0/6557280 - 
 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or 
 cannot assign requested address)
 WARN 10:53:22,611 Failed attempt 2 to connect to /10.10.10.100 to stream 
 MapData024/HOS/MapData024-HOS-hc-1-Data.db sections=1 progress=0/6551840 - 
 0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or 
 cannot assign requested address)
 progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] [/10.10.10.101 0/1 
 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]
 [root@bms-web2 ~]# java -version
 java version 1.7.0_04
 Java(TM) SE Runtime Environment (build 1.7.0_04-b20)
 Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
 [root@bms-web2 ~]# cat /etc/redhat-release
 CentOS release 5.7 (Final)
 
 Is it possible the sstableloader only works now if a Cassandra instance is 
 also running on the same server? The only other difference I see is CentOS 
 6.2 vs CentOS 5.x
 The new sstableloader, does it still use the Cassandra.yaml or is it 
 completely independent? 
 
 Kind regards
 
 -Original Message-
 From: Pieter Callewaert [mailto:pieter.callewa...@be-mobile.be] 
 Sent: woensdag 9 mei 2012 17:41
 To: user@cassandra.apache.org
 Subject: RE: sstableloader 1.1 won't stream
 
 I don't see any entries in the logs of the nodes.
 
 I've disabled SELinux, to be sure this wasn't a blocking factor, and tried 
 adding -Djava.net.preferIPv4Stack=true to bin/sstableloader, but no change 
 unfortunately.
 
 To summarize, I'm trying to use sstableloader from a server (CentOS release 
 5.8 (Final)) not running Cassandra to a 3-node Cassandra cluster. All running 
 1.1.
 My next step will be to try to use sstableloader on one of the nodes from the 
 cluster, to see if that works...
 
 If anyone has any other ideas, please share.
 
 Kind regards,
 Pieter Callewaert
 
 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: woensdag 9 mei 2012 10:45
 To: user@cassandra.apache.org
 Subject: Re: sstableloader

RE: sstableloader 1.1 won't stream

2012-05-08 Thread Pieter Callewaert

I've updated all nodes to 1.1 but I keep getting the same problem...
Any other thoughts about this?

Kind regards,
Pieter

-Original Message-
From: Benoit Perroud [mailto:ben...@noisette.ch] 
Sent: maandag 7 mei 2012 22:21
To: user@cassandra.apache.org
Subject: Re: sstableloader 1.1 won't stream

You may want to upgrade all your nodes to 1.1.

The streaming process connect to every living nodes of the cluster (you can 
explicitely diable some nodes), so all nodes need to speak 1.1.

2012/5/7 Pieter Callewaert pieter.callewa...@be-mobile.be:
 Hi,

 I’m trying to upgrade our bulk load process in our testing env.

 We use the SSTableSimpleUnsortedWriter to write tables, and use 
 sstableloader to stream it into our cluster.

 I’ve changed the writer program to fit to the 1.1 api, but now I’m 
 having troubles to load them to our cluster. The cluster exists out of 
 one 1.1 node and two 1.0.9 nodes.

 I’ve enabled debug as parameter and in the log4j conf.

 [root@bms-app1 ~]# ./apache-cassandra/bin/sstableloader --debug -d
 10.10.10.100 /tmp/201205071234/MapData024/HOS/

 INFO 16:25:40,735 Opening
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1 (1588949 bytes)

 INFO 16:25:40,755 JNA not found. Native methods will be disabled.

 DEBUG 16:25:41,060 INDEX LOAD TIME for
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1: 327 ms.

 Streaming revelant part of
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to 
 [/10.10.10.102, /10.10.10.100, /10.10.10.101]

 INFO 16:25:41,083 Stream context metadata 
 [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=1
 progress=0/6557280 - 0%], 1 sstables.

 DEBUG 16:25:41,084 Adding file
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.

 INFO 16:25:41,087 Streaming to /10.10.10.102

 DEBUG 16:25:41,092 Files are
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=1
 progress=0/6557280 - 0%

 INFO 16:25:41,099 Stream context metadata 
 [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=1
 progress=0/6551840 - 0%], 1 sstables.

 DEBUG 16:25:41,100 Adding file
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.

 INFO 16:25:41,100 Streaming to /10.10.10.100

 DEBUG 16:25:41,100 Files are
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=1
 progress=0/6551840 - 0%

 INFO 16:25:41,102 Stream context metadata 
 [/tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=2
 progress=0/6566400 - 0%], 1 sstables.

 DEBUG 16:25:41,102 Adding file
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db to be streamed.

 INFO 16:25:41,102 Streaming to /10.10.10.101

 DEBUG 16:25:41,102 Files are
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=2
 progress=0/6566400 - 0%

 progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] 
 [/10.10.10.101 0/1 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 
 16:25:41,107 Failed attempt 1 to connect to /10.10.10.101 to stream 
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=2
 progress=0/6566400 - 0%. Retrying in 4000 ms. (java.net.SocketException:
 Invalid argument or cannot assign requested address)

 WARN 16:25:41,108 Failed attempt 1 to connect to /10.10.10.102 to 
 stream /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=1
 progress=0/6557280 - 0%. Retrying in 4000 ms. (java.net.SocketException:
 Invalid argument or cannot assign requested address)

 WARN 16:25:41,108 Failed attempt 1 to connect to /10.10.10.100 to 
 stream /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=1
 progress=0/6551840 - 0%. Retrying in 4000 ms. (java.net.SocketException:
 Invalid argument or cannot assign requested address)

 progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] 
 [/10.10.10.101 0/1 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 
 16:25:45,109 Failed attempt 2 to connect to /10.10.10.101 to stream 
 /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=2
 progress=0/6566400 - 0%. Retrying in 8000 ms. (java.net.SocketException:
 Invalid argument or cannot assign requested address)

 WARN 16:25:45,110 Failed attempt 2 to connect to /10.10.10.102 to 
 stream /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=1
 progress=0/6557280 - 0%. Retrying in 8000 ms. (java.net.SocketException:
 Invalid argument or cannot assign requested address)

 WARN 16:25:45,110 Failed attempt 2 to connect to /10.10.10.100 to 
 stream /tmp/201205071234/MapData024/HOS/MapData024-HOS-hc-1-Data.db 
 sections=1
 progress=0/6551840 - 0%. Retrying in 8000 ms. (java.net.SocketException:
 Invalid argument or cannot assign requested address)

 progress: [/10.10.10.102 0/1 (0)] [/10.10.10.100 0/1 (0)] 
 [/10.10.10.101 0/1 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 
 16:25:53,113 Failed attempt 3 to connect to /10.10.10.101 to stream 
 /tmp/201205071234/MapData024/HOS

RE: sstableloader throws storage_port error

2011-08-11 Thread Tom Davidson

I am trying to sstableloader and I do not want to access Cassandra on the same 
node. I haved edited my Cassandra.yaml to with appropriate values for the 
listen_address and rpc_address but I keep getting the error below. The 
Cassandra-cli tool, nodetool etc. works find when trying to connect to my 
Cassandra cluster, but sstableloader does not. Any suggestions?

[tdavidson@nadevsan06 ~]$ sstableloader --debug -v Demo
Starting client (and waiting 30 seconds for gossip) ...
org.apache.cassandra.config.ConfigurationException: Unable to bind to address 
nadevsan04/10.168.121.57:7000. Set listen_address in cassandra.yaml to an 
interface you can bind to, e.g., your private IP address on EC2
java.lang.RuntimeException: org.apache.cassandra.config.ConfigurationException: 
Unable to bind to address nadevsan04/10.168.121.57:7000. Set listen_address in 
cassandra.yaml to an interface you can bind to, e.g., your private IP address 
on EC2
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:225)
at 
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:104)
at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:61)
Caused by: org.apache.cassandra.config.ConfigurationException: Unable to bind 
to address nadevsan04/10.168.121.57:7000. Set listen_address in cassandra.yaml 
to an interface you can bind to, e.g., your private IP address on EC2
at 
org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:220)
at 
org.apache.cassandra.net.MessagingService.listen(MessagingService.java:191)
at 
org.apache.cassandra.service.StorageService.initClient(StorageService.java:350)
at 
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:182)
... 2 more

From: John Conwell [mailto:j...@iamjohn.me]
Sent: Tuesday, July 26, 2011 12:11 PM
To: user@cassandra.apache.org
Subject: Re: sstableloader throws storage_port error

After much research and experimentation, I figured out how to get sstableloader 
running on the same machine as a live cassandra node instance.

The key, as Jonathan stated is to configure sstableloader to use a different 
ipaddress than the running cassandra instance is using.  To do this, I ran this 
command, which created the loopback address for 127.0.0.2


sudo ifconfig lo0 alias 127.0.0.2

No you can have cassandra configured to listen to 127.0.0.1, and sstableloader 
configured to listen to 127.0.0.2

By the way, to remove this ipaddress, run

sudo ifconfig lo0 -alias 127.0.0.2

But thats not really all.  Because sstableloader reads the cassandra.yaml file 
to get the gossip ipaddress, you need to make a copy of the cassandra install 
directory (or at least the bin and conf folders).  Basically one folder with 
yaml configured for Cassandra, the other folder with yaml configured for 
sstableloader.

Hope this helps people. I've written an in depth description of how to do all 
this, and can post it if people want, but I'm not sure the etiquette of posting 
blog links in the email list.

Thanks,
John

On Tue, Jul 26, 2011 at 7:40 AM, John Conwell 
j...@iamjohn.memailto:j...@iamjohn.me wrote:
If I have Cassandra already running on my machine, how do I configure 
sstableloader to run on a different IP (127.0.0.2).  Also, does that mean in 
order to use sstableloader on the same machine as an running Cassandra node, I 
have to have two NIC cards?

I looked around for any info about how to configure and run sstableloader, but 
other than what the cmdline spits out I cant find anything.  Are there any 
examples or best practices?  Is it designed to be run on a machine that isn't 
running a cassandra node?

On Mon, Jul 25, 2011 at 8:24 PM, Jonathan Ellis 
jbel...@gmail.commailto:jbel...@gmail.com wrote:
sstableloader uses gossip to discover the Cassandra ring, so you'll
need to run it on a different IP (127.0.0.2 is fine).

On Mon, Jul 25, 2011 at 2:41 PM, John Conwell 
j...@iamjohn.memailto:j...@iamjohn.me wrote:
 I'm trying to figure out how to use the sstableloader tool.  For my test I
 have a single node cassandra instance running on my local machine.  I have
 cassandra running, and validate this by connecting to it with cassandra-cli.
 I run sstableloader using the following command:
 bin/sstableloader /Users/someuser/cassandra/mykeyspace
 and I get the following error:
 org.apache.cassandra.config.ConfigurationException: 
 localhost/127.0.0.1:7000http://127.0.0.1:7000
 is in use by another process.  Change listen_address:storage_port in
 cassandra.yaml to values that do not conflict with other services

 I've played around with different ports, but nothing works.  It it because
 I'm trying to run sstableloader on the same machine that cassandra is
 running on?  It would be odd I think, but cant thing of another reason I
 would get that eror.
 Thanks,
 John


--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source

Re: sstableloader throws storage_port error

2011-08-11 Thread Jonathan Ellis

Unable to bind to address nadevsan04/10.168.121.57:7000 means
something else is using that address/port.  netstat can tell you what
process that is, if you're not sure.

On Thu, Aug 11, 2011 at 4:24 PM, Tom Davidson tdavid...@covario.com wrote:
 I am trying to sstableloader and I do not want to access Cassandra on the
 same node. I haved edited my Cassandra.yaml to with appropriate values for
 the listen_address and rpc_address but I keep getting the error below. The
 Cassandra-cli tool, nodetool etc. works find when trying to connect to my
 Cassandra cluster, but sstableloader does not. Any suggestions?



 [tdavidson@nadevsan06 ~]$ sstableloader --debug -v Demo

 Starting client (and waiting 30 seconds for gossip) ...

 org.apache.cassandra.config.ConfigurationException: Unable to bind to
 address nadevsan04/10.168.121.57:7000. Set listen_address in cassandra.yaml
 to an interface you can bind to, e.g., your private IP address on EC2

 java.lang.RuntimeException:
 org.apache.cassandra.config.ConfigurationException: Unable to bind to
 address nadevsan04/10.168.121.57:7000. Set listen_address in cassandra.yaml
 to an interface you can bind to, e.g., your private IP address on EC2

     at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:225)

     at
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:104)

     at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:61)

 Caused by: org.apache.cassandra.config.ConfigurationException: Unable to
 bind to address nadevsan04/10.168.121.57:7000. Set listen_address in
 cassandra.yaml to an interface you can bind to, e.g., your private IP
 address on EC2

     at
 org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:220)

     at
 org.apache.cassandra.net.MessagingService.listen(MessagingService.java:191)

     at
 org.apache.cassandra.service.StorageService.initClient(StorageService.java:350)

     at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:182)

     ... 2 more



 From: John Conwell [mailto:j...@iamjohn.me]
 Sent: Tuesday, July 26, 2011 12:11 PM
 To: user@cassandra.apache.org
 Subject: Re: sstableloader throws storage_port error



 After much research and experimentation, I figured out how to get
 sstableloader running on the same machine as a live cassandra node instance.



 The key, as Jonathan stated is to configure sstableloader to use a different
 ipaddress than the running cassandra instance is using.  To do this, I ran
 this command, which created the loopback address for 127.0.0.2



 sudo ifconfig lo0 alias 127.0.0.2



 No you can have cassandra configured to listen to 127.0.0.1, and
 sstableloader configured to listen to 127.0.0.2



 By the way, to remove this ipaddress, run

 sudo ifconfig lo0 -alias 127.0.0.2



 But thats not really all.  Because sstableloader reads the cassandra.yaml
 file to get the gossip ipaddress, you need to make a copy of the cassandra
 install directory (or at least the bin and conf folders).  Basically one
 folder with yaml configured for Cassandra, the other folder with yaml
 configured for sstableloader.



 Hope this helps people. I've written an in depth description of how to do
 all this, and can post it if people want, but I'm not sure the etiquette of
 posting blog links in the email list.



 Thanks,
 John



 On Tue, Jul 26, 2011 at 7:40 AM, John Conwell j...@iamjohn.me wrote:

 If I have Cassandra already running on my machine, how do I configure
 sstableloader to run on a different IP (127.0.0.2).  Also, does that mean in
 order to use sstableloader on the same machine as an running Cassandra node,
 I have to have two NIC cards?



 I looked around for any info about how to configure and run sstableloader,
 but other than what the cmdline spits out I cant find anything.  Are there
 any examples or best practices?  Is it designed to be run on a machine that
 isn't running a cassandra node?



 On Mon, Jul 25, 2011 at 8:24 PM, Jonathan Ellis jbel...@gmail.com wrote:

 sstableloader uses gossip to discover the Cassandra ring, so you'll
 need to run it on a different IP (127.0.0.2 is fine).

 On Mon, Jul 25, 2011 at 2:41 PM, John Conwell j...@iamjohn.me wrote:
 I'm trying to figure out how to use the sstableloader tool.  For my test I
 have a single node cassandra instance running on my local machine.  I have
 cassandra running, and validate this by connecting to it with
 cassandra-cli.
 I run sstableloader using the following command:
 bin/sstableloader /Users/someuser/cassandra/mykeyspace
 and I get the following error:
 org.apache.cassandra.config.ConfigurationException:
 localhost/127.0.0.1:7000
 is in use by another process.  Change listen_address:storage_port in
 cassandra.yaml to values that do not conflict with other services

 I've played around with different ports, but nothing works.  It it because
 I'm trying to run sstableloader on the same machine

Re: sstableloader throws storage_port error

2011-07-26 Thread John Conwell

If I have Cassandra already running on my machine, how do I configure
sstableloader to run on a different IP (127.0.0.2).  Also, does that mean in
order to use sstableloader on the same machine as an running Cassandra node,
I have to have two NIC cards?

I looked around for any info about how to configure and run sstableloader,
but other than what the cmdline spits out I cant find anything.  Are there
any examples or best practices?  Is it designed to be run on a machine that
isn't running a cassandra node?


On Mon, Jul 25, 2011 at 8:24 PM, Jonathan Ellis jbel...@gmail.com wrote:

 sstableloader uses gossip to discover the Cassandra ring, so you'll
 need to run it on a different IP (127.0.0.2 is fine).

 On Mon, Jul 25, 2011 at 2:41 PM, John Conwell j...@iamjohn.me wrote:
  I'm trying to figure out how to use the sstableloader tool.  For my test
 I
  have a single node cassandra instance running on my local machine.  I
 have
  cassandra running, and validate this by connecting to it with
 cassandra-cli.
  I run sstableloader using the following command:
  bin/sstableloader /Users/someuser/cassandra/mykeyspace
  and I get the following error:
  org.apache.cassandra.config.ConfigurationException: localhost/
 127.0.0.1:7000
  is in use by another process.  Change listen_address:storage_port in
  cassandra.yaml to values that do not conflict with other services
 
  I've played around with different ports, but nothing works.  It it
 because
  I'm trying to run sstableloader on the same machine that cassandra is
  running on?  It would be odd I think, but cant thing of another reason I
  would get that eror.
  Thanks,
  John



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 

Thanks,
John C

Re: sstableloader throws storage_port error

2011-07-26 Thread John Conwell

After much research and experimentation, I figured out how to get
sstableloader running on the same machine as a live cassandra node instance.

The key, as Jonathan stated is to configure sstableloader to use a different
ipaddress than the running cassandra instance is using.  To do this, I ran
this command, which created the loopback address for 127.0.0.2

   sudo ifconfig lo0 alias 127.0.0.2

No you can have cassandra configured to listen to 127.0.0.1, and
sstableloader configured to listen to 127.0.0.2

By the way, to remove this ipaddress, run

sudo ifconfig lo0 -alias 127.0.0.2

But thats not really all.  Because sstableloader reads the cassandra.yaml
file to get the gossip ipaddress, you need to make a copy of the cassandra
install directory (or at least the bin and conf folders).  Basically one
folder with yaml configured for Cassandra, the other folder with yaml
configured for sstableloader.

Hope this helps people. I've written an in depth description of how to do
all this, and can post it if people want, but I'm not sure the etiquette of
posting blog links in the email list.

Thanks,
John

On Tue, Jul 26, 2011 at 7:40 AM, John Conwell j...@iamjohn.me wrote:

 If I have Cassandra already running on my machine, how do I configure
 sstableloader to run on a different IP (127.0.0.2).  Also, does that mean in
 order to use sstableloader on the same machine as an running Cassandra node,
 I have to have two NIC cards?

 I looked around for any info about how to configure and run sstableloader,
 but other than what the cmdline spits out I cant find anything.  Are there
 any examples or best practices?  Is it designed to be run on a machine that
 isn't running a cassandra node?


 On Mon, Jul 25, 2011 at 8:24 PM, Jonathan Ellis jbel...@gmail.com wrote:

 sstableloader uses gossip to discover the Cassandra ring, so you'll
 need to run it on a different IP (127.0.0.2 is fine).

 On Mon, Jul 25, 2011 at 2:41 PM, John Conwell j...@iamjohn.me wrote:
  I'm trying to figure out how to use the sstableloader tool.  For my test
 I
  have a single node cassandra instance running on my local machine.  I
 have
  cassandra running, and validate this by connecting to it with
 cassandra-cli.
  I run sstableloader using the following command:
  bin/sstableloader /Users/someuser/cassandra/mykeyspace
  and I get the following error:
  org.apache.cassandra.config.ConfigurationException: localhost/
 127.0.0.1:7000
  is in use by another process.  Change listen_address:storage_port in
  cassandra.yaml to values that do not conflict with other services
 
  I've played around with different ports, but nothing works.  It it
 because
  I'm trying to run sstableloader on the same machine that cassandra is
  running on?  It would be odd I think, but cant thing of another reason I
  would get that eror.
  Thanks,
  John



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




 --

 Thanks,
 John C




-- 

Thanks,
John C

Re: sstableloader throws storage_port error

2011-07-25 Thread Jonathan Ellis

sstableloader uses gossip to discover the Cassandra ring, so you'll
need to run it on a different IP (127.0.0.2 is fine).

On Mon, Jul 25, 2011 at 2:41 PM, John Conwell j...@iamjohn.me wrote:
 I'm trying to figure out how to use the sstableloader tool.  For my test I
 have a single node cassandra instance running on my local machine.  I have
 cassandra running, and validate this by connecting to it with cassandra-cli.
 I run sstableloader using the following command:
 bin/sstableloader /Users/someuser/cassandra/mykeyspace
 and I get the following error:
 org.apache.cassandra.config.ConfigurationException: localhost/127.0.0.1:7000
 is in use by another process.  Change listen_address:storage_port in
 cassandra.yaml to values that do not conflict with other services

 I've played around with different ports, but nothing works.  It it because
 I'm trying to run sstableloader on the same machine that cassandra is
 running on?  It would be odd I think, but cant thing of another reason I
 would get that eror.
 Thanks,
 John



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

89 matches

Mail list logo