Re: sstablesplit - status

2017-05-18 Thread Jan Kesten

Hi again,

and thanks for the input. It's not tombstoned data I think, but over a 
really long time very many rows are inserted over and over again - but 
with some significant pauses between the inserts. I found some examples 
where a specific row (for example pk=xyz, value=123) exists in more than 
one or two tables, with exactly the same content but different timestamps.


The largest sstables compacted a while ago are now 300-400G in size over 
some nodes, and it's very unlikely they will be compacted some time soon 
as there are only one or two sstables of that size on a single node.


I think I will try rebootstraping a node to see if that helps. 
sstablesplit exists in 2.x - but as far as I know is deprecated and in 
my 3.6 test-cluster it was gone.


I was trying sstabledump to have a deeper look - but that says "pre-3.0 
SSTabe is not supported" (fair, I am on a 2.2.8 cluster).


Jan


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



sstablesplit - status

2017-05-17 Thread Jan Kesten

Hi all,

I have some problem with really large sstables which dont get compacted 
anymore and I know there are many duplicated rows in them. Splitting the 
tables into smaller ones to get them compacted again would help I 
thought, so I tried sstablesplit, but:


cassandra@cassandra01 /tmp/cassandra $ 
./apache-cassandra-3.10/tools/bin/sstablesplit lb-388151-big-Data.db

Skipping non sstable file lb-388151-big-Data.db
No valid sstables to split
cassandra@cassandra01 /tmp/cassandra $ sstablesplit lb-388151-big-Data.db
Skipping non sstable file lb-388151-big-Data.db
No valid sstables to split

It seems that sstablesplit cant handle the "new" filename pattern 
anymore (acutally running 2.2.8 on those nodes).


Any hints or other suggestions to split those sstables or get rid of them?

Thanks in advance,
Jan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Read after Write inconsistent at times

2017-02-24 Thread Jan Kesten

Hi,

are your nodes at high load? Are there any dropped messages (nodetool 
tpstats) on any node?


Also have a look at your system clocks. C* needs them in thight sync - 
via ntp for example. Side hint: if you use ntp use the same set of 
upstreams on all of your nodes - ideal your own one. Using pool.ntp.org 
might lead to minimal dirfts in time across your cluster.


Another thing that could help you out is using client side timestamps: 
https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/ 
(of course only when you are using a single client or all clients are in 
sync via ntp).



Am 24.02.2017 um 07:29 schrieb Charulata Sharma (charshar):


Hi All,

In my application sometimes I cannot read data that just got inserted. 
This happens very intermittently. Both write and read use LOCAL QUOROM.


We have a cluster of 12 nodes which spans across 2 Data Centers and a 
RF of 3.


Has anyone encountered this problem and if yes what steps have you 
taken to solve it


Thanks,
Charu


--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68
enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO)

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.



Re: Cluster scaling

2017-02-08 Thread Jan Kesten

Hi Branislav,

what is it you would expect?

Some thoughts:

Batches are often misunderstood, they work well only if they contain 
only one partition key - think of a batch of different sensor data to 
one key. If you group batches with many partition keys and/or do large 
batches this puts high load on the coordinator node with then itself 
needs to talk to the nodes holding the partitions. This could explain 
the scaling you see in your second try without batches. Keep in mind 
that the driver supports executeAsync and ResultSetFutures.


Second, put commitlog and data directories on seperate disks when using 
spindles.


Third, have you monitored iostats and cpustats while running your tests?

Cheers,

Jan

Am 08.02.2017 um 16:39 schrieb Branislav Janosik -T (bjanosik - AAP3 INC 
at Cisco):


Hi all,

I have a cluster of three nodes and would like to ask some questions 
about the performance.


I wrote a small benchmarking tool in java that mirrors (read, write) 
operations that we do in the real project.


Problem is that it is not scaling like it should. The program runs two 
tests: one using batch statement and one without using the batch.


The operation sequence is: optional select, insert, update, insert. I 
run the tool on my server with 128 threads (# of threads has no 
influence on the performance),


creating usually 100K resources for testing purposes.

The average results (operations per second) with the use of batch 
statement are:


Replication Factor = 1   with readingwithout reading

1-node cluster 37K 46K

2-node cluster 37K 47K

3-node cluster 39K 70K

Replication Factor = 2   with readingwithout reading

2-node cluster 21K 40K

3-node cluster 30K 48K

The average results (operations per second) without the use of batch 
statement are:


Replication Factor = 1   with readingwithout reading

1-node cluster 31K 20K

2-node cluster 38K 39K

3-node cluster 45K 87K

Replication Factor = 2   with readingwithout reading

2-node cluster 19K 22K

3-node cluster 26K 36K

The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at 
least 30GB of disk space for each node. Non SSD, each VM is on 
separate physical server.


The code is available here 
https://github.com/bjanosik/CassandraBenchTool.git . It can be built 
with Maven and then you can use jar in target directory with java -jar 
target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar .


Thank you for any help.



--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68
enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO)

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.



Re: Hotspots / Load on Cassandra node

2016-10-25 Thread Jan Kesten
Hi,

can you check the size of your data directories on that machine to verify in 
comparison to the others?

Have a look for snapshot directories which could still be there from a former 
table or keyspace.

Regards,
Jan

Am 26. Oktober 2016 06:53:03 MESZ, schrieb Harikrishnan A :
>Hello,
>When I am issuing nodetool status, I see the load ( in GB) on one of
>the node is high compare to the other nodes in my ring.
>I do not see any issues with the Data Modeling, and it looks like the
>Partition sizes are almost evenly sized and distributed across the
>nodes.  Repairs are running properly.   
>How do I approach and fix this issue?. 
>
>Thanks & Regards,Hari

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Re: Thousands of SSTables generated in only one node

2016-10-25 Thread Jan Kesten

Hi Lahiru,

2.1.0 is also quite old (Sep 2014) - and just from my memory I 
remembered that there was an issue whe had with cold_reads_to_omit:


http://grokbase.com/t/cassandra/user/1523sm4y0r/how-to-deal-with-too-many-sstables
https://www.mail-archive.com/search?l=user@cassandra.apache.org=subject:%22Re%3A+Compaction+failing+to+trigger%22=newest=1

That's just a random google hits but maybe that also helps.

I ended up with a few thousand sstables smaller than 1MB in size. 
However I would suggest upgrading to a newer version of cassandra first 
before diving too deep into this - maybe 2.1.16 or 2.2.8 - as chances 
are really good your problems will be gone after that.


Regards.
Jan



Re: Thousands of SSTables generated in only one node

2016-10-25 Thread Jan Kesten

Hi Lahiru,

maybe your node was running out of memory before. I saw this behaviour 
if available heap is low forcing to flush out memtables to sstables 
quite often.


If this is that what is hitting you, you should see that the sstables 
are really small.


To cleanup, nodetool compact would do the job - but if you do not need 
data from one of the keyspaces at all just drop and recreate it (but 
look into your data directory if there are snapshots left). Prevent this 
in future have a close look at heap consumption and maybe give it more 
memory.


HTH,
Jan


Re: Large primary keys

2016-04-11 Thread Jan Kesten

Hi Robert,

why do you need the actual text as a key? I sounds a bit unatural at 
least for me. Keep in mind that you cannot do "like" queries on keys in 
cassandra. For performance and keeping things more readable I would 
prefer hashing your text and use the hash as key.


You should also take into account to store the keys (hashes) in a 
seperate table per day / hour or something like that, so you can quickly 
get all keys for a time range. A query without the partition key may be 
very slow.


Jan

Am 11.04.2016 um 23:43 schrieb Robert Wille:

I have a need to be able to use the text of a document as the primary key in a 
table. These texts are usually less than 1K, but can sometimes be 10’s of K’s 
in size. Would it be better to use a digest of the text as the key? I have a 
background process that will occasionally need to do a full table scan and 
retrieve all of the texts, so using the digest doesn’t eliminate the need to 
store the text. Anyway, is it better to keep primary keys small, or is C* okay 
with large primary keys?

Robert





Re: NTP Synchronization Setup Changes

2016-03-30 Thread Jan Kesten
Hi Mickey,

I would strongly suggest to setup a NTP server on your site - this is not 
really a big deal and with some tutorials on the net done quickly. Then 
configure your cassandra nodes (and all the rest if you like) to use your ntp 
instead of public ones. As I have learned the hard way - cassandra is not 
really happy when nodes have different times in some cases.

Benefit of this is, that your nodes will keep time in sync even without 
connection to the internet. Of course "your time" may drift without a proper 
timesource or connection but all nodes will have the same drift and so no 
problems with consistency. If your ntp syncs your nodes will be adjusted 
smoothly.

Pro(?)-solution (what I did before): Attach a gps mouse to your ntp server and 
use that as time source. So you can have synchronized _and_ accurate time 
without any connection to public ntp servers as the gps satellites are flying 
atom clocks :)

Just my 2 cents,
Jan

Von meinem iPhone gesendet

> Am 31.03.2016 um 03:07 schrieb Mukil Kesavan :
> 
> Hi,
> 
> We run a 3 server cassandra cluster that is initially NTP synced to a single 
> physical server over LAN. This server does not have connectivity to the 
> internet for a few hours to sometimes even days. In this state we perform 
> some schema operations and reads/writes with QUORUM consistency.
> 
> Later on, the physical server has connectivity to the internet and we 
> synchronize its time to an external NTP server on the internet. 
> 
> Are there any issues if this causes a huge time correction on the cassandra 
> cluster? I know that NTP gradually corrects the time on all the servers. I 
> just wanted to understand if there were any corner cases that will cause us 
> to lose data/schema updates when this happens. In particular, we seem to be 
> having some issues around missing secondary indices at the moment (not all 
> but some).
> 
> Also, for our situation where we have to work with cassandra for a while 
> without internet connectivity, what is the preferred NTP architecture/steps 
> that have worked for you in the field?
> 
> Thanks,
> Micky



Thrift composite partition key to cql migration

2016-03-30 Thread Jan Kesten

Hi,

while migrating the reminder of thrift operations in my application I 
came across a point where I cant find a good hint.


In our old code we used a composite with two strings as row / partition 
key and a similar composite as column key like this:


public Composite rowKey() {
final Composite composite = new Composite();
composite.addComponent(key1, StringSerializer.get());
composite.addComponent(key2, StringSerializer.get());
return composite;
}

public Composite columnKey() {
final Composite composite = new Composite();
composite.addComponent(key3, StringSerializer.get());
composite.addComponent(key4, StringSerializer.get());
return composite;
}

In cql this columnfamiliy looks like this:

CREATE TABLE foo.bar (
key blob,
column1 text,
column2 text,
value blob,
PRIMARY KEY (key, column1, column2)
)

For the columns key3 and key4 became column1 and column2 - but the old 
rowkey is presented as blob (I can put it into a hex editor and see that 
key1 and key2 values are in there).


Any pointers to handle this or is this a known issue? I am using now 
DataStax Java driver for CQL, old connector used thrift. Is there any 
way to get key1 and key2 back apart from completly rewriting the table? 
This is what I had expected it to be:


CREATE TABLE foo.bar (
key1 text,
key2 text,
column1 text,
column2 text,
value blob,
PRIMARY KEY ((key1, key2), column1, column2)
)

Cheers,
Jan


Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Jan Kesten
Hi Branton,

two cents from me - I didnt look through the script, but for the rsyncs I do 
pretty much the same when moving them. Since they are immutable I do a first 
sync while everything is up and running to the new location which runs really 
long. Meanwhile new ones are created and I sync them again online, much less 
files to copy now. After that I shutdown the node and my last rsync now has to 
copy only a few files which is quite fast and so the downtime for that node is 
within minutes.

Jan



Von meinem iPhone gesendet

> Am 18.02.2016 um 22:12 schrieb Branton Davis :
> 
> Alain, thanks for sharing!  I'm confused why you do so many repetitive 
> rsyncs.  Just being cautious or is there another reason?  Also, why do you 
> have --delete-before when you're copying data to a temp (assumed empty) 
> directory?
> 
>> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ  wrote:
>> I did the process a few weeks ago and ended up writing a runbook and a 
>> script. I have anonymised and share it fwiw.
>> 
>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>> 
>> It is basic bash. I tried to have the shortest down time possible, making 
>> this a bit more complex, but it allows you to do a lot in parallel and just 
>> do a fast operation sequentially, reducing overall operation time.
>> 
>> This worked fine for me, yet I might have make some errors while making it 
>> configurable though variables. Be sure to be around if you decide to run 
>> this. Also I automated this more by using knife (Chef), I hate to repeat 
>> ops, this is something you might want to consider.
>> 
>> Hope this is useful,
>> 
>> C*heers,
>> -
>> Alain Rodriguez
>> France
>> 
>> The Last Pickle
>> http://www.thelastpickle.com
>> 
>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal :
>>> Hey Branton,
>>> 
>>> Please do let us know if you face any problems  doing this.
>>> 
>>> Thanks
>>> anishek
>>> 
 On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis 
  wrote:
 We're about to do the same thing.  It shouldn't be necessary to shut down 
 the entire cluster, right?
 
> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli  
> wrote:
> 
> 
>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal  
>> wrote:
>> To accomplish this can I just copy the data from disk1 to disk2 with in 
>> the relevant cassandra home location folders, change the cassanda.yaml 
>> configuration and restart the node. before starting i will shutdown the 
>> cluster.
> 
> Yes.
> 
> =Rob
> 


Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread Jan Kesten
Hi,

the embedded cassandra to speedup entering the project may will work for 
developers, we used it for junit. But a simple clone and maven build - I guess 
it will end in a single node cassandra cluster. Remember cassandra is a 
distributed database, one will need more than one node to get performance and 
fault tolerance. Also I would not recommend adding and removing of cluster 
nodes at high frequency with application start-stop-cycles.

To help in getting things up and running, provide a small readme for 
downloading and starting cassandra. For mac and linux unpacking the tar.gz and 
running cassandra.sh is not too complicated. Or use a hint to the DataStax 
Community Edition installers. Apart from installing Java that is a five minute 
stop to a single node "TestCluster".

Configuring a distributed setup is a bit more or a lot more difficult and 
definitly needs more understanding and planning. 

Just as a hint and offtopic: I saw people using cassandra as application glue 
for interprocess communication where every app server started a node (for 
communication, sessions and as queue and so on).  If that is eventually a use 
case - have a look at hazelcast. 

Jan

Von meinem iPhone gesendet

> Am 14.02.2016 um 23:26 schrieb John Sanda :
> 
> The motivation was to make it easy for someone to get up and running quickly 
> with the project. Clone the git repo, run the maven build, and then you are 
> all set. It definitely does lower the learning curve for someone just getting 
> started with a project and who is not really thinking about Cassandra. It 
> also is convenient for non-devs who need to quickly get the project up and 
> running. For development, we have people working on Linux, Mac OS X, and 
> Windows. I am not a Windows user and not even sure if ccm works on Windows, 
> so ccm can't be the de factor standard for development.
> 
>> On Sun, Feb 14, 2016 at 2:52 PM, Jack Krupansky  
>> wrote:
>> What motivated the use of an embedded instance for development - as opposed 
>> to simply spawning a process for Cassandra?
>> 
>> 
>> 
>> -- Jack Krupansky
>> 
>>> On Sun, Feb 14, 2016 at 2:05 PM, John Sanda  wrote:
>>> The project I work on day to day uses an embedded instance of Cassandra, 
>>> but it is intended for primarily for development. We embed Cassandra in a 
>>> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I 
>>> personally do not do this. I use and recommend ccm for development. If you 
>>> do you WildFly, there is also wildfly-cassandra which deploys Cassandra as 
>>> a custom WildFly extension. In other words it is deployed in WildFly like 
>>> other subsystems like EJB, web, etc, not like an application. There isn't a 
>>> whole lot of active development on this, but it could be another option.
>>> 
>>> For production, we have to support single node clusters (not embedded 
>>> though), and it has been challenging for pretty much all the reasons you 
>>> find people saying not to do so.
>>> 
>>> As for failure detection and cluster membership changes, are you using the 
>>> Datastax driver? You can register an event listener with the driver to 
>>> receive notifications for those things.
>>> 
 On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad  
 wrote:
 +1 to what jack said. Don't mess with embedded till you understand the 
 basics of the db. You're not making your system any less complex, I'd say 
 you're most likely going to shoot yourself in the foot. 
> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky  
> wrote:
> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain can 
> be avoided. Two nodes would not support HA. You need to be able to reach 
> a quorum, which is defined as n/2+1 where n is the number of replicas. 
> IOW, you cannot update the data if a quorum cannot be reached. The data 
> on any given node needs to be replicated on at least two other nodes.
> 
> Embedded Cassandra is only for extremely sophisticated developers - not 
> those who are new to Cassandra, with a "superficial understanding".
> 
> As a general proposition, you should not be running application code on 
> Cassandra nodes.
> 
> That said, if any of the senior Cassandra developers wish to personally 
> support your efforts towards embedded clusters, they are certainly free 
> to do so. we'll see if any of them step forward.
> 
> 
> -- Jack Krupansky
> 
>> On Sat, Feb 13, 2016 at 3:47 PM, Binil Thomas 
>>  wrote:
>> Hi all,
>> 
>> TL;DR: I have a very superficial understanding of Cassandra and am 
>> currently evaluating it for a project. 
>> 
>> * Can Cassandra be embedded into another JVM application? 
>> * Can such embedded instances form a cluster? 
>> * Can the application 

Re: Sudden disk usage

2016-02-13 Thread Jan Kesten
Hi,

what kind of compaction strategy do you use? What you are about to see is a 
compaction likely - think of 4 sstables of 50gb each, compacting those can take 
up 200g while rewriting the new sstable. After that the old ones are deleted 
and space will be freed again. 

If using SizeTieredCompaction you can end up with very huge sstables as I do 
(>250gb each). In the worst case you could possibly need twice the space - a 
reason why I set up my monitoring for disk to 45% usage.

Just my 2 cents.
Jan

Von meinem iPhone gesendet

> Am 13.02.2016 um 08:48 schrieb Branton Davis :
> 
> One of our clusters had a strange thing happen tonight.  It's a 3 node 
> cluster, running 2.1.10.  The primary keyspace has RF 3, vnodes with 256 
> tokens.
> 
> This evening, over the course of about 6 hours, disk usage increased from 
> around 700GB to around 900GB on only one node.  I was at a loss as to what 
> was happening and, on a whim, decided to run nodetool cleanup on the 
> instance.  I had no reason to believe that it was necessary, as no nodes were 
> added or tokens moved (not intentionally, anyhow).  But it immediately 
> cleared up that extra space.
> 
> I'm pretty lost as to what would have happened here.  Any ideas where to look?
> 
> Thanks!
> 


Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Jan Kesten
Hi Rahul,

it should work as you would expect - simply copy over the sstables from
your extra disk to the original one. To minimize downtime of the node
you can do something like this:

- rsync the files while the node is still running (sstables are
immutable) to copy most of the data
- edit cassandra.yaml to remove the additional datadir
- shutdown the node
- rsync again (just for the case, a new sstable got written while the
first one was running)
- restart

HTH
Jan

Am 14.01.2016 um 08:38 schrieb Rahul Ramesh:
> One update. I cleared the snapshot using nodetool clearsnapshot command.
> Disk space is recovered now. 
> 
> Because of this issue, I have mounted one more drive to the server and
> there are some data files there. How can I migrate the data so that I
> can decommission the drive? 
> Will it work if I just copy all the contents in the table directory to
> one of the drives? 
> 
> Thanks for all the help.
> 
> Regards,
> Rahul
> 
> On Thursday 14 January 2016, Rahul Ramesh <rr.ii...@gmail.com
> <mailto:rr.ii...@gmail.com>> wrote:
> 
> Hi Jan,
> I checked it. There are no old Key Spaces or tables.
> Thanks for your pointer, I started looking inside the directories. I
> see lot of snapshots directory inside the table directory. These
> directories are consuming space.
> 
> However these snapshots are not shown  when I issue listsnapshots
> ./bin/nodetool listsnapshots
> Snapshot Details: 
> There are no snapshots
> 
> Can I safely delete those snapshots? why listsnapshots is not
> showing the snapshots? Also in future, how can we find out if there
>     are snapshots?
> 
> Thanks,
> Rahul
> 
> 
> 
> On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten <j.kes...@enercast.de
> <javascript:_e(%7B%7D,'cvml','j.kes...@enercast.de');>> wrote:
> 
> Hi Rahul,
> 
> just an idea, did you have a look at the data directorys on disk
> (/var/lib/cassandra/data)? It could be that there are some from
> old keyspaces that have been deleted and snapshoted before. Try
> something like "du -sh /var/lib/cassandra/data/*" to verify
> which keyspace is consuming your space.
> 
> Jan
> 
> Von meinem iPhone gesendet
> 
> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh <rr.ii...@gmail.com
> <javascript:_e(%7B%7D,'cvml','rr.ii...@gmail.com');>>:
> 
>> Thanks for your suggestion. 
>>
>> Compaction was happening on one of the large tables. The disk
>> space did not decrease much after the compaction. So I ran an
>> external compaction. The disk space decreased by around 10%.
>> However it is still consuming close to 750Gb for load of 250Gb. 
>>
>> I even restarted cassandra thinking there may be some open
>> files. However it didnt help much. 
>>
>> Is there any way to find out why so much of data is being
>> consumed? 
>>
>> I checked if there are any open files using lsof. There are
>> not any open files.
>>
>> *Recovery:*
>> Just a wild thought 
>> I am using replication factor of 2 and I have two nodes. If I
>> delete complete data on one of the node, will I be able to
>> recover all the data from the active node? 
>> I don't want to pursue this path as I want to find out the
>> root cause of the issue! 
>>
>>
>> Any help will be greatly appreciated
>>
>> Thank you,
>>
>> Rahul
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo <r...@pythian.com
>> <javascript:_e(%7B%7D,'cvml','r...@pythian.com');>> wrote:
>>
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase
>> disk space. But I think Carlos Alonso might be correct.
>> Running compactions might be the issue.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>  
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>> _linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>_
>> Mobile: +351 91 891 81 00
>> <tel:%2B351%2091%20891%2081%2000> | Tel: +1 613 565 8696
>> x1649 &

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Jan Kesten
Hi Rahul,

just an idea, did you have a look at the data directorys on disk 
(/var/lib/cassandra/data)? It could be that there are some from old keyspaces 
that have been deleted and snapshoted before. Try something like "du -sh 
/var/lib/cassandra/data/*" to verify which keyspace is consuming your space.

Jan

Von meinem iPhone gesendet

> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh :
> 
> Thanks for your suggestion. 
> 
> Compaction was happening on one of the large tables. The disk space did not 
> decrease much after the compaction. So I ran an external compaction. The disk 
> space decreased by around 10%. However it is still consuming close to 750Gb 
> for load of 250Gb. 
> 
> I even restarted cassandra thinking there may be some open files. However it 
> didnt help much. 
> 
> Is there any way to find out why so much of data is being consumed? 
> 
> I checked if there are any open files using lsof. There are not any open 
> files.
> 
> Recovery:
> Just a wild thought 
> I am using replication factor of 2 and I have two nodes. If I delete complete 
> data on one of the node, will I be able to recover all the data from the 
> active node? 
> I don't want to pursue this path as I want to find out the root cause of the 
> issue! 
> 
> 
> Any help will be greatly appreciated
> 
> Thank you,
> 
> Rahul
> 
> 
> 
> 
> 
> 
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo  wrote:
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase disk space. But I 
>> think Carlos Alonso might be correct. Running compactions might be the issue.
>> 
>> Regards,
>> 
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>  
>> Pythian - Love your data
>> 
>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>> 
>>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso  wrote:
>>> I'd have a look also at possible running compactions.
>>> 
>>> If you have big column families with STCS then large compactions may be 
>>> happening.
>>> 
>>> Check it with nodetool compactionstats
>>> 
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
 On 13 January 2016 at 05:22, Kevin O'Connor  wrote:
 Have you tried restarting? It's possible there's open file handles to 
 sstables that have been compacted away. You can verify by doing lsof and 
 grepping for DEL or deleted. 
 
 If it's not that, you can run nodetool cleanup on each node to scan all of 
 the sstables on disk and remove anything that it's not responsible for. 
 Generally this would only work if you added nodes recently. 
 
 
> On Tuesday, January 12, 2016, Rahul Ramesh  wrote:
> We have a 2 node Cassandra cluster with a replication factor of 2. 
> 
> The load factor on the nodes is around 350Gb
> 
> Datacenter: Cassandra
> ==
> Address  RackStatus State   LoadOwns  
>   Token   
>   
>   -5072018636360415943
> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%   
>   -7068746880841807701   
> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%   
>   -5072018636360415943
> 
> However,if I use df -h, 
> 
> /dev/xvdf   252G  223G   17G  94% /HDD1
> /dev/xvdg   493G  456G   12G  98% /HDD2
> /dev/xvdh   197G  167G   21G  90% /HDD3
> 
> 
> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one 
> of the machine and in another machine it is close to 650Gb. 
> 
> I started repair 2 days ago, after running repair, the amount of disk 
> space consumption has actually increased. 
> I also checked if this is because of snapshots. nodetool listsnapshot 
> intermittently lists a snapshot but it goes away after sometime. 
> 
> Can somebody please help me understand, 
> 1. why so much disk space is consumed?
> 2. Why did it increase after repair?
> 3. Is there any way to recover from this state.
> 
> 
> Thanks,
> Rahul
>> 
>> 
>> --
>> 
> 


ClosedChannelExcption while nodetool repair

2016-01-12 Thread Jan Kesten
Hi,

I have some problems recently on my cassandra cluster. I am running 12
nodes with 2.2.4 and while repairing with a plain "nodetool repair". In
system.log I can find

ERROR [STREAM-IN-/172.17.2.233] 2016-01-08 08:32:38,327
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
java.nio.channels.ClosedChannelException: null

on one node and at the same time in the the node mentioned in the Log:

INFO  [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073
StreamResultFuture.java:168 - [Stream
#5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving
2 files(46708049 bytes), sending 2 files(1856721742 bytes)
ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe
unterbrochen (broken pipe)
at
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144)
~[apache-cassandra-2.2.4.jar:2.2.4]


Full relevant NFO  [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073
StreamResultFuture.java:168 - [Stream
#5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving
2 files(46708049 bytes), sending 2 files(1856721742 bytes)
ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe
unterbrochen (broken pipe)
at
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144)
~[apache-cassandra-2.2.4.jar:2.2.4]

More complete log can be found here:

http://pastebin.com/n6DjCCed
http://pastebin.com/6rD5XNwU

I already did a nodetool scrub.

Any suggestions what is causing this?

Thanks in advance,
Jan


Strange Sizes after 2.1.3 upgrade

2015-03-03 Thread Jan Kesten

Hi,

I found something strange this morning on our secondary cluster. I 
upgraded to 2.1.3 - hoping for incremental repairs to work - recently 
and this morning OpsCenter showed me disk usages to be very unequal. 
Most irritating is that some nodes show data sizes of  3TB on one node, 
but they have only 3 TB drives. I made a screenshot.


https://www.dropbox.com/s/0qhbpm1znwd07rj/strange_sizes.png?dl=0

Did this occur somewhere else? Maybe it is totally unrelated to 2.1.3 
upgrade.


Thanks for any pointers,
Jan






Re: Node stuck in joining the ring

2015-02-26 Thread Jan Kesten

Hi Batranut,

apart from the other suggestions - do you have ntp running on all your 
cluster nodes and are times in sync?


Jan


Re: Node joining take a long time

2015-02-20 Thread Jan Kesten

Hi,

a short hint for those upgrading: If you upgrade to 2.1.3 - there is a 
bug in the config builder when rpc_interface is used.  If you use 
rpc_address in your cassandra.yaml you will be fine - I ran into it this 
morning and filed an issue for it.


https://issues.apache.org/jira/browse/CASSANDRA-8839

Jan


Re: Nodetool clearsnapshot

2015-01-13 Thread Jan Kesten

Hi,

I have read that snapshots are basicaly symlinks and they do not take 
that much space.
Why if I run nodetool clearsnapshot it frees a lot of space? I am 
seeing GBs freed...


both together makes sense. Creating a snaphot just creates links for all 
files unter the snapshot directory. This is very fast and takes no 
space. But those links are hard links, not symbolic ones.


After a while your running cluster will compact some of its sstables and 
writing it to a new one as deleting the old ones. Now for example you 
had SSTable1..4 and a snapshot with the links to those four after 
compaction you will have one active SSTable5 which is newly written and 
consumes space. The snapshot-linked ones are still there, still 
consuming their space. Only when this snapshot is cleared you get your 
disk space back.


HTH,
Jan




Re: Replacing nodes disks

2014-12-22 Thread Jan Kesten

Hi,

even if recovery like a dead node would work - backup and restore (like 
my way with an usb docking station) will be much faster and produce less 
IO and CPU impact on your cluster.


Keep that in Mind :-)

Cheers,
Jan

Am 22.12.2014 um 10:58 schrieb Or Sher:

Great. replace_address works great.
From some reason I thought it won't work with the same IP.


On Sun, Dec 21, 2014 at 5:14 PM, Ryan Svihla rsvi...@datastax.com 
mailto:rsvi...@datastax.com wrote:


Cassandra is designed to rebuild a node from other nodes, whether
a node is dead by your hand because you killed it or fate is
irrelevant, the process is the same, a new node can be the same
hostname and ip or it can have totally different ones.

On Sun, Dec 21, 2014 at 6:01 AM, Or Sher or.sh...@gmail.com
mailto:or.sh...@gmail.com wrote:

If I'll use the replace_address parameter with the same IP
address, would that do the job?

On Sun, Dec 21, 2014 at 11:20 AM, Or Sher or.sh...@gmail.com
mailto:or.sh...@gmail.com wrote:

What I want to do is kind of replacing a dead node -

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html

But replacing it with a clean node with the same IP and
hostname.

On Sun, Dec 21, 2014 at 9:53 AM, Or Sher
or.sh...@gmail.com mailto:or.sh...@gmail.com wrote:

Thanks guys.
I have to replace all data disks, so I don't have
another large enough local disk to move the data to.
If I'll have no choice, I will backup the data before
on some other node or something, but I'd like to avoid it.
I would really love letting Cassandra do it thing and
rebuild itself.
Did anybody handled such cases that way (Letting
Cassandra rebuild it's data?)
Although there are no documented procedure for it, It
should be possible right?

On Fri, Dec 19, 2014 at 8:41 AM, Jan Kesten
j.kes...@enercast.de mailto:j.kes...@enercast.de
wrote:

Hi Or,

I did some sort of this a while ago. If your
machines do have a free disk slot - just put
another disk there and use it as another
data_file_directory.

If not - as in my case:

- grab an usb dock for disks
- put the new one in there, plug in, format, mount
to /mnt etc.
- I did an online rsync from
/var/lib/cassandra/data to /mnt
- after that, bring cassandra down
- do another rsync from /var/lib/cassandra/data to
/mnt (should be faster, as sstables do not change,
minimizes downtime)
- if you need adjust /etc/fstab if needed
- shutdown the node
- swap disks
- power on the node
- everything should be fine ;-)

Of course you will need a replication factor  1
for this to work ;-)

Just my 2 cents,
Jan

rsync the full contents there,

Am 18.12.2014 um 16:17 schrieb Or Sher:

Hi all,

We have a situation where some of our nodes
have smaller disks and we would like to align
all nodes by replacing the smaller disks to
bigger ones without replacing nodes.
We don't have enough space to put data on /
disk and copy it back to the bigger disks so
we would like to rebuild the nodes data from
other replicas.

What do you think should be the procedure here?

I'm guessing it should be something like this
but I'm pretty sure it's not enough.
1. shutdown C* node and server.
2. replace disks + create the same vg lv etc.
3. start C* (Normally?)
4. nodetool repair/rebuild?
*I think I might get some consistency issues
for use cases relying on Quorum reads and
writes for strong consistency.
What do you say?

Another question is (and I know it depends on
many factors but I'd like to hear an
experienced estimation): How much time would
take to rebuild a 250G data node

sstablemetadata and sstablerepairedset not working with DSC on Debian

2014-12-18 Thread Jan Kesten

Hi,

while curious on the new incremental repairs I updated our cluster to C* 
version 2.1.2 via the Debian apt-repository. Everything went quite well, 
but trying to start the tools sstablemetadata and sstablerepairedset 
lead to the following error:


root@a01:/home/ifjke# sstablerepairedset
Error: Could not find or load main class 
org.apache.cassandra.tools.SSTableRepairedAtSetter

root@a01:/home/ifjke#

Looking at the scripts starting these tools I found that the java 
classpath is build via


for jar in `dirname $0`/../../lib/*.jar; do
CLASSPATH=$CLASSPATH:$jar
done

Because of the scripts beeing located in /usr/bin/ this leads to search 
for libs in /lib. Obviously there are no java or cassandra libraries 
there - nodetool instead uses a different way:


if [ x$CASSANDRA_INCLUDE = x ]; then
for include in `dirname $0`/cassandra.in.sh \
   $HOME/.cassandra.in.sh \
   /usr/share/cassandra/cassandra.in.sh \
/usr/local/share/cassandra/cassandra.in.sh \
   /opt/cassandra/cassandra.in.sh; do
if [ -r $include ]; then
. $include
break
fi
done
elif [ -r $CASSANDRA_INCLUDE ]; then
. $CASSANDRA_INCLUDE
fi

I created a simple patch which works for both sstablemetadata and 
sstablerepairedset for me, but maybe that's worth sharing it:


---SNIP---

--- sstablerepairedset2014-11-11 15:50:02.0 +
+++ sstablerepairedset_new2014-12-18 07:52:26.967368891 +
@@ -16,22 +16,19 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-if [ x$CLASSPATH = x ]; then
-
-# execute from the build dir.
-if [ -d `dirname $0`/../../build/classes ]; then
-for directory in `dirname $0`/../../build/classes/*; do
-CLASSPATH=$CLASSPATH:$directory
-done
-else
-if [ -f `dirname $0`/../lib/stress.jar ]; then
-CLASSPATH=`dirname $0`/../lib/stress.jar
+if [ x$CASSANDRA_INCLUDE = x ]; then
+for include in `dirname $0`/cassandra.in.sh \
+   $HOME/.cassandra.in.sh \
+   /usr/share/cassandra/cassandra.in.sh \
+ /usr/local/share/cassandra/cassandra.in.sh \
+   /opt/cassandra/cassandra.in.sh; do
+if [ -r $include ]; then
+. $include
+break
 fi
-fi
-
-for jar in `dirname $0`/../../lib/*.jar; do
-CLASSPATH=$CLASSPATH:$jar
 done
+elif [ -r $CASSANDRA_INCLUDE ]; then
+. $CASSANDRA_INCLUDE
 fi

 # Use JAVA_HOME if set, otherwise look for java in PATH


---SNIP---

Worked for me on both tools.

Jan


Re: Replacing nodes disks

2014-12-18 Thread Jan Kesten

Hi Or,

I did some sort of this a while ago. If your machines do have a free 
disk slot - just put another disk there and use it as another 
data_file_directory.


If not - as in my case:

- grab an usb dock for disks
- put the new one in there, plug in, format, mount to /mnt etc.
- I did an online rsync from /var/lib/cassandra/data to /mnt
- after that, bring cassandra down
- do another rsync from /var/lib/cassandra/data to /mnt (should be 
faster, as sstables do not change, minimizes downtime)

- if you need adjust /etc/fstab if needed
- shutdown the node
- swap disks
- power on the node
- everything should be fine ;-)

Of course you will need a replication factor  1 for this to work ;-)

Just my 2 cents,
Jan

rsync the full contents there,

Am 18.12.2014 um 16:17 schrieb Or Sher:

Hi all,

We have a situation where some of our nodes have smaller disks and we 
would like to align all nodes by replacing the smaller disks to bigger 
ones without replacing nodes.
We don't have enough space to put data on / disk and copy it back to 
the bigger disks so we would like to rebuild the nodes data from other 
replicas.


What do you think should be the procedure here?

I'm guessing it should be something like this but I'm pretty sure it's 
not enough.

1. shutdown C* node and server.
2. replace disks + create the same vg lv etc.
3. start C* (Normally?)
4. nodetool repair/rebuild?
*I think I might get some consistency issues for use cases relying on 
Quorum reads and writes for strong consistency.

What do you say?

Another question is (and I know it depends on many factors but I'd 
like to hear an experienced estimation): How much time would take to 
rebuild a 250G data node?


Thanks in advance,
Or.

--
Or Sher




Re: Cassandra schema migrator

2014-11-25 Thread Jan Kesten

Hi Jens,

maybe you should have a look at mutagen for cassandra:

https://github.com/toddfast/mutagen-cassandra

It is a litte quiet around this for some months, but maybe still worth it.

Cheers,
Jan

Am 25.11.2014 um 10:22 schrieb Jens Rantil:

Hi,

Anyone who is using, or could recommend, a tool for versioning 
schemas/migrating in Cassandra? My list of requirements is:

 * Support for adding tables.
 * Support for versioning of table properties. All our tables are to 
be defaulted to LeveledCompactionStrategy.

 * Support for adding non-existing columns.
 * Optional: Support for removing columns.
 * Optional: Support for removing tables.

We are preferably a Java shop, but could potentially integrate 
something non-Java. I understand I could write a tool that would make 
these decisions using system.schema_columnfamilies and 
system.schema_columns, but as always reusing a proven tool would be 
preferable.


So far I only know of Spring Data Cassandra that handles creating 
tables and adding columns. However, it does not handle table 
properties in any way.


Thanks,
Jens

——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se 
Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter




repair -pr does not return

2014-05-02 Thread Jan Kesten

Hello together,

I'm running a cassandra cluster with 2.0.6 and 6 nodes. As far as I 
know, routine repairs are still mandatory for handling tombstones - even 
I noticed that the cluster now does a snapshot-repair by default.


Now my cluster is running a while and has a load of about 200g per node 
- running a nodetool repair -pr on one of the nodes seems to run 
forever, right now it's running for 2 complete days and does not return.


Any suggestions?

Thanks in advance,
Jan




Re: repair -pr does not return

2014-05-02 Thread Jan Kesten

Hi Duncan,

is it actually doing something or does it look like it got stuck?  
2.0.7 has a fix for a getting stuck problem.


it starts with sending merkle trees and streaming for some time (some 
hours in fact) and then seems just to hang. So I'll try to update and 
see it that's solves the issue. Thanks for that hint!


Cheers,
Jan




Re: Cassandra Disk storage capacity

2014-04-07 Thread Jan Kesten

Hi Hari,

C* will use your entire space - that is something one should monitor. 
Depending on your choose on compaction strategy your data_dir should not 
be filled up entirely - in the worst case compaction will need space as 
large as the sstables on disk, therefore 50% should be free space.


The parameters used for on disk storage are commitlog_directory and 
data_file_directories and saved_caches_directory. The paramter 
data_file_directories is in plural, you can easily put more than one 
directory here (and you should do this instead of using RAID).


Cheers,
Jan

Am 07.04.2014 12:56, schrieb Hari Rajendhran:

Hi Team,

We have a 3 node Apache cassandra 2.0.4 setup installed in our lab 
setup.We have set data directory to /var/lib/cassandra/data.What would 
be the maximum

disk storage that will be used for cassandra data storage.

Note : /var partition has a storage capacity of 40GB.

My question is whether cassandra will  the entire / directory for data 
storage ?

If no, how to specify multiple directories for data storage ??





Best Regards
Hari Krishnan Rajendhran
Hadoop Admin
DESS-ABIM ,Chennai BIGDATA Galaxy
Tata Consultancy Services
Cell:- 9677985515
Mailto: hari.rajendh...@tcs.com
Website: http://www.tcs.com

Experience certainty. IT Services
Business Solutions
Consulting


=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you




--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9
enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.



Re: Cassandra Disk storage capacity

2014-04-07 Thread Jan Kesten

Am 07.04.2014 13:24, schrieb Hari Rajendhran:
1) I am confused why cassandra uses the entire disk space ( / 
Directory) even when we specify /var/lib/cassandra/data as the 
directory in Cassandra.yaml file
2) Is it only during compaction ,cassandra will use the entire Disk 
space ?
3) What is the best way to monitor the cassandra Disk usage ?? is 
there a opensource monitoring tool for this ??


Hi,

if your / and /var/lib/cassandra/data are on different disks (or 
partitions) only /var/lib/cassandra/data will get filled entirely. Often 
this is not the case per default and you will have to create this 
mountpoints by yourself. Also keep in mind to keep commitlogs on a 
seperate disk from data to improve performance.


The extra space is only needed during compaction - but cassandra will 
fire up compactions by itself, so you must keep this free space 
maintained all the time. This is valid for SizeTieredCompation, Leveled- 
or HybridCompations are cheaper on disk space.


For the last point - there are many tools to monitor your servers inside 
your cluster. Nagios, Hyperic HQ and OpenNMS are some of them - you can 
define alerts which keep you up to date.


Cheers,
jan


Re: Corrupted sstable and sstableloader

2013-07-22 Thread Jan Kesten

On 18.07.2013 19:19, Robert Coli wrote:


Why not just determine which SSTable is corrupt, remove it from the 
restore set, then run a repair when you're done to be totally sure all 
data is on all nodes?


This is what I did finally - was some kind of work, since sstableloader 
just stopped with exception but no hint which file was affected. So I 
replayed the sstables one by one and finally found the corrupt one.


Thanks to all,
Jan

--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9
enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.



Corrupted sstable and sstableloader

2013-07-18 Thread Jan Kesten

Hello together,

today I experienced a problem while loading a snapshot from our 
cassandra cluster to test cluster. The cluster has six nodes and I took 
a snapshot from all nodes concurrently and tried to import them in the 
other cluster.


From 5 out of 6 nodes importing went well with no errors. But one 
snapshot of one node cannot be imported - I tried serveral times. I got 
the following while running sstableloader:


ERROR 09:13:06,084 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.io.IOException: Datenübergabe 
unterbrochen (broken pipe)

at com.google.common.base.Throwables.propagate(Throwables.java:160)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)

at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
at 
org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

... 3 more
Exception in thread Streaming to /172.17.2.216:1 
java.lang.RuntimeException: java.io.IOException: Datenübergabe 
unterbrochen (broken pipe)

at com.google.common.base.Throwables.propagate(Throwables.java:160)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)

at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
at 
org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
at 
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)

... 3 more

I suspect that the sstable on the node is corrupted in some way - and a 
scrub and repair should fix that I suppose.


Since the original cluster has a replication factor of 3 - shoudn't the 
import from 5 of 6 snapshots contain all data? Or is the sstableloader 
tool too clever and avoids importing double data?


Thanks for hints,
Jan

--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9
enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.



Re: Corrupted sstable and sstableloader

2013-07-18 Thread Jan Kesten
Hi,

I think it might be corrupted due to a poweroutage. Apart from this issue 
reading the data with consistency level quorum (I have three replicas) did not 
issue an error - only the import to a different cluster.

So, if I import all nodes except the one with the corrupted sstable - shoudn't 
I import two of the three replicas, so that the data is complete?


Von meinem iPhone gesendet

Am 18.07.2013 um 19:06 schrieb sankalp kohli kohlisank...@gmail.com:

 sstable might be corrupted due to bad disk. In that case, replication does 
 not matter.
 
 
 On Thu, Jul 18, 2013 at 8:52 AM, Jan Kesten j.kes...@enercast.de wrote:
 Hello together,
 
 today I experienced a problem while loading a snapshot from our cassandra 
 cluster to test cluster. The cluster has six nodes and I took a snapshot 
 from all nodes concurrently and tried to import them in the other cluster.
 
 From 5 out of 6 nodes importing went well with no errors. But one snapshot 
 of one node cannot be imported - I tried serveral times. I got the following 
 while running sstableloader:
 
 ERROR 09:13:06,084 Error in ThreadPoolExecutor
 java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen 
 (broken pipe)
 at com.google.common.base.Throwables.propagate(Throwables.java:160)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
 at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
 at 
 sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
 at 
 org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
 at 
 org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 ... 3 more
 Exception in thread Streaming to /172.17.2.216:1 
 java.lang.RuntimeException: java.io.IOException: Datenübergabe unterbrochen 
 (broken pipe)
 at com.google.common.base.Throwables.propagate(Throwables.java:160)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:724)
 Caused by: java.io.IOException: Datenübergabe unterbrochen (broken pipe)
 at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
 at 
 sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:420)
 at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:552)
 at 
 org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:93)
 at 
 org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 ... 3 more
 
 I suspect that the sstable on the node is corrupted in some way - and a 
 scrub and repair should fix that I suppose.
 
 Since the original cluster has a replication factor of 3 - shoudn't the 
 import from 5 of 6 snapshots contain all data? Or is the sstableloader tool 
 too clever and avoids importing double data?
 
 Thanks for hints,
 Jan
 
 -- 
 Jan Kesten, mailto:j.kes...@enercast.de
 Tel.: +49 561/4739664-0 FAX: -9
 enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   HRB15471
 http://www.enercast.de Online-Prognosen für erneuerbare Energien
 Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz
 
 Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
 geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
 sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
 benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie 
 diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie 
 diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. 
 Vielen Dank.
 
 This e-mail and any attachment may contain confidential and/or privileged 
 information. If you are not the named addressee or if this transmission has 
 been addressed to you in error, please notify us immediately by reply e-mail 
 and then delete this e-mail and any attachment from your system. Please 
 understand that you must not copy this e-mail or any attachment or disclose 
 the contents to any other person. Thank you for your cooperation.
 


Re: CorruptBlockException - recover?

2013-07-05 Thread Jan Kesten

Hi,

i tried to scrub the keyspace - but with no success either, the process 
threw an exception when hitting the corrupt block and stopped then. I 
will rebootstrap the node :-)


Thanks anyways,
Jan

On 03.07.2013 19:10, Glenn Thompson wrote:
For what its worth.  I did this when I had this problem.  It didn't 
work out for me.  Perhaps I did something wrong.



On Wed, Jul 3, 2013 at 11:06 AM, Robert Coli rc...@eventbrite.com 
mailto:rc...@eventbrite.com wrote:


On Wed, Jul 3, 2013 at 7:04 AM, ifjke j.kes...@enercast.de
mailto:j.kes...@enercast.de wrote:

I found that one of my cassandra nodes died recently (machine
hangs). I restarted the node an run a nodetool repair, while
running it has thrown a org.apache.cassandra.io
http://org.apache.cassandra.io.compress.CorruptBlockException.
Is there any way to recover from this? Or would it be best to
delete the nodes contents and bootstrap it again?


If you scrub this SSTable (either with the online or offline
version of scrub) it will remove the corrupt data and re-write
the rest of the SSTable which isn't corrupt into a new SSTable.
That is probably safer for your data than deleting the entire set
of data on this replica. When that's done, restart the repair.

=Rob





--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9
enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel   HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Dipl. Ing. Thomas Landgraf, Bernd Kratz

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.



Problem setting up encrypted communication

2013-03-13 Thread Jan Kesten

Hello together,

after my inital tests all is up and running, replacing a dead node was 
no problem at all. Now I tried to setup encryption between nodes. I set 
up keystores and a truststore as described in the docs. Every node has 
it's own keystore with one private key and a truststore with all 
imported public keys/certs.


for my first node:

db02, Mar 13, 2013, PrivateKeyEntry,
Certificate fingerprint (SHA1): 
D3:B1:37:8A:05:43:F1:7A:F9:70:7A:4C:91:6F:09:96:BF:75:21:81


for my second node:

db01, Mar 13, 2013, PrivateKeyEntry,
Certificate fingerprint (SHA1): 
BA:E9:F4:06:15:AE:CC:79:18:8B:69:C0:70:EF:19:82:0E:81:76:E8


shared truststore:

db02, Mar 13, 2013, trustedCertEntry,
Certificate fingerprint (SHA1): 
D3:B1:37:8A:05:43:F1:7A:F9:70:7A:4C:91:6F:09:96:BF:75:21:81

db01, Mar 13, 2013, trustedCertEntry,
Certificate fingerprint (SHA1): 
BA:E9:F4:06:15:AE:CC:79:18:8B:69:C0:70:EF:19:82:0E:81:76:E8


relevant cassandra.yaml (db01 and db02 differ on both nodes):

server_encryption_options:
internode_encryption: all
keystore: /home/cassandra/certs/db01.keystore
keystore_password: cassandra
truststore: /home/cassandra/certs/.truststore
truststore_password: cassandra

Now the question that puzzels me. If I disable encryption and start both 
nodes the join each other an I have a working cluster. If I enable 
encryption they do not join any longer and I have to seperate nodes.


Any hints?

Thanks,
Jan



Replacing dead node when num_tokens is used

2013-03-05 Thread Jan Kesten

Hello,

while trying out cassandra I read about the steps necessary to replace a 
dead node. In my test cluster I used a setup using num_tokens instead of 
initial_tokens. How do I replace a dead node in this scenario?


Thanks,
Jan


Re: Replacing dead node when num_tokens is used

2013-03-05 Thread Jan Kesten

Hello Aaron,

thanks for your reply.

Found it just an hour ago on my own, yesterday I accidentally looked at 
the 1.0 docs. Right now my replacement node is streaming from the others 
- than more testing can follow.


Thanks again,
Jan