RE: Data Modelling Help

2015-04-29 Thread Donald Smith
Secondary indicies are inefficient and are deprecated, as far as I know.

Unless you store many thousands of emails for a long time (which I recommend 
against), just use a single table with the partition key being the userid and 
the timestamp being the clustering (column) key, as in your schema.   You might 
want to use a TTL to expire old emails.

If you need to store a huge number of emails, consider splitting into tables by 
year, for example.

If you had two tables (one for read emails and one for unread emails) you’d 
have to move rows between them when an email got marked (un)read.  But it would 
support efficiently finding (un)read emails.

Don

From: Sandeep Gupta [mailto:sandy@gmail.com]
Sent: Monday, April 27, 2015 11:46 AM
To: user@cassandra.apache.org
Subject: Fwd: Data Modelling Help

Hi,

I am a newbie with Cassandra and thus need data modelling help as I haven't 
found a resource that tackles the same problem.

The user case is similar to an email-system. I want to store a timeline of all 
emails a user has received and then fetch them back with three different ways:

1. All emails ever received
2. Mails that have been read by a user
3. Mails that are still unread by a user

My current model is as under:

CREATE TABLE TIMELINE (
userID varchar,
emailID varchar,
timestamp bigint,
read boolean,
PRIMARY KEY (userID, timestamp)
) WITH CLUSTERING ORDER BY (timestamp desc);

CREATE INDEX ON TIMELINE (userID, read);

The queries I need to support are:

SELECT * FROM TIMELINE where userID = 12;
SELECT * FROM TIMELINE where userID = 12 order by timestamp asc;
SELECT * FROM TIMELINE where userID = 12 and read = true;
SELECT * FROM TIMELINE where userID = 12 and read = false;
SELECT * FROM TIMELINE where userID = 12 and read = true order by timestamp asc;
SELECT * FROM TIMELINE where userID = 12 and read = false order by timestamp 
asc;


Queries are:

1. Should I keep  read as my secondary index as It will be frequently updated 
and can create tombstones - per 
http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_when_use_index_c.html its a 
problem.

2. Can we do inequality check on secondary index because i found out that 
atleast one equality condition should be present on secondary index

3. If this is not the right way to model, please suggest on how to support the 
above queries. Maintaining three different tables worries me about the number 
of insertions (for read/unread) as number of users * emails viewed per day will 
be huge.


Thanks in advance.

Best Regards!





Keep Walking,
~ Sandeep





Cassandra hanging in IntervalTree.comparePoints() and in CompactionController.maxPurgeableTimestamp()

2015-04-29 Thread Donald Smith
We deployed a brand new 13 node 2.1.4 C* cluster and used sstabloader to stream 
about 500GB into cassandra.   The streaming took less than a day but afterwards 
 pending compactions do not decrease.  The Cassandra nodes (which have about 
500 pending compactions each) seem to spend most of their time in

IntervalTree.comparePoints() and in
CompactionController.maxPurgegableTimestamp()

(sometimes, too, in 
com.google.common.util.concurrent.Uninterruptibles.sleepUninteruptedly()).  
That's what Java VisualVM shows that they're doing in the sampler.

Many of the nodes show 100% cpu usage per core.

Any idea what's causing it to hang?

Might http://qnalist.com/questions/5818079/reasons-for-nodes-not-compacting  or 
https://issues.apache.org/jira/browse/CASSANDRA-8914  explain it?  Altering the 
table to use ,  'cold_reads_to_omit': 0.0  didn't help.

  Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Tables showing up as our_table-147a2090ed4211e480153bc81e542ebd/ in data dir

2015-04-28 Thread Donald Smith
Using 2.1.4, tables in our data/ directory are showing up as


our_table-147a2090ed4211e480153bc81e542ebd/


instead of as


 our_table/


Why would that happen? We're also seeing lagging compactions and high cpu usage.


 Thanks, Don


Questions about bootrapping and compactions during bootstrapping

2014-12-16 Thread Donald Smith
Looking at the output of nodetool netstats I see that the bootstrapping nodes 
pulling from only two of the nine nodes currently in the datacenter.   That 
surprises me: I'd think the vnodes it pulls from would be randomly spread 
across the existing nodes.  We're using Cassandra 2.0.11 with 256 vnodes each.

I also notice that while bootstrapping, the node is quite busy doing 
compactions.   There are over 1000 pending compactions on the new node and it's 
not finished bootstrapping. I'd think those would be unnecessary, since the 
other nodes in the data center have zero pending compactions.  Perhaps the 
compactions explains why running du -hs /var/lib/cassandra/data on the new 
node shows more disk space usage than on the old nodes.

Is it reasonable to do nodetool disableautocompaction on the bootstrapping 
node? Should that be the default???

If I start bootstrapping one node, it's not yet in the cluster but it decides 
which token ranges it owns and requests streams for that data. If  I then try 
to bootstrap a SECOND node concurrently, it will take over ownership of some 
token ranges from the first node. Will the first node then adjust what data it 
streams?

It seems to me the cassandra server needs to keep track of both the OLD token 
ranges and vnodes and the NEW ones.  I'm not convinced that running two 
bootstraps concurrently (starting the second one after several minutes of 
delay) is safe.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



RE: stream_throughput_outbound_megabits_per_sec

2014-10-22 Thread Donald Smith
Sorry, I copy-and-pasted the wrong variable name.  I meant to copy and paste 
streaming_socket_timeout_in_ms. So my question should be:

streaming_socket_timeout_in_ms is the timeout per operation on the streaming 
socket.   The docs recommend not to set  it too low (because a timeout causes 
streaming to restart from the beginning). But the default 0 never times out.  
What's a reasonable value?


# Enable socket timeout for streaming operation.
# When a timeout occurs during streaming, streaming is retried from the start
# of the current file. This _can_ involve re-streaming an important amount of
# data, so you should avoid setting the value too low.
# Default value is 0, which never timeout streams.
# streaming_socket_timeout_in_ms: 0

My second question is: Does it stream an entire SSTable in one operation? I 
doubt it.  How large is the object it streams in one operation?  I'm tempted to 
put the timeout at 30 seconds or 1 minute. Is that too low?.

The entire file (SSTable) is large – several hundred megabytes.  Is the timeout 
for streaming the entire file?  Or only a block of it?

Don

From: Marcus Eriksson [mailto:krum...@gmail.com]
Sent: Friday, October 17, 2014 4:05 AM
To: user@cassandra.apache.org
Subject: Re: stream_throughput_outbound_megabits_per_sec



On Thu, Oct 16, 2014 at 1:54 AM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:


stream_throughput_outbound_megabits_per_sec  is the timeout per operation on 
the streaming socket.   The docs recommend not to have it too low (because a 
timeout causes streaming to restart from the beginning). But the default 0 
never times out.  What's a reasonable value?

no, it is not a timeout, it states how fast sstables are streamed


Does it stream an entire SSTable in one operation? I doubt it.  How large is 
the object it streams in one operation?  I'm tempted to put the timeout at 30 
seconds or 1 minute. Is that too low?

unsure what you meat by 'operation' here, but it is one tcp connection, 
streaming the whole file (if thats what we want)


/Marcus


Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Donald Smith
Question about the read path in cassandra.  If a partition/row is in the 
Memtable and is being actively written to by other clients,  will a READ of 
that partition also have to hit SStables on disk (or in the page cache)?  Or 
can it be serviced entirely from the Memtable?

If you select all columns (e.g., select * from )   then I can imagine 
that cassandra would need to merge whatever columns are in the Memtable with 
what's in SStables on disk.

But if you select a single column (e.g., select Name from   where id= 
) and if that column is in the Memtable, I'd hope cassandra could skip 
checking the disk.  Can it do this optimization?

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



RE: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Donald Smith
On the cassandra irc channel I discussed this question.  I learned that the 
timestamp in the Memtable may be OLDER than the timestamp in some SSTable 
(e.g., due to hints or retries).  So there’s no guarantee that the Memtable has 
the most recent version.

But there may be cases, they say, in which the time stamp in the SSTable can be 
used to skip over SSTables that have older data (via metadata on SSTables, I 
presume).

Memtable are like write-through caches and do NOT correspond to SSTables loaded 
from disk.

From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of 
Jonathan Haddad
Sent: Wednesday, October 22, 2014 9:24 AM
To: user@cassandra.apache.org
Subject: Re: Is cassandra smart enough to serve Read requests entirely from 
Memtables in some cases?

No.  Consider a scenario where you supply a timestamp a week in the future, 
flush it to sstable, and then do a write, with the current timestamp.  The 
record in disk will have a timestamp greater than the one in the memtable.

On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
Question about the read path in cassandra.  If a partition/row is in the 
Memtable and is being actively written to by other clients,  will a READ of 
that partition also have to hit SStables on disk (or in the page cache)?  Or 
can it be serviced entirely from the Memtable?

If you select all columns (e.g., “select * from ….”)   then I can imagine that 
cassandra would need to merge whatever columns are in the Memtable with what’s 
in SStables on disk.

But if you select a single column (e.g., “select Name from ….  where id= ….”) 
and if that column is in the Memtable, I’d hope cassandra could skip checking 
the disk.  Can it do this optimization?

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866tel:425.201.3900%20x%203866
C: (206) 819-5965tel:%28206%29%20819-5965
F: (646) 443-2333tel:%28646%29%20443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]




--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


Re: Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

2014-10-15 Thread Donald Smith
Even with vnodes, when you add a node to a cluster, it takes over some portions 
of the token range.  If the other nodes have been running for a long time you 
should bootstrap the new node, so it gets old data.  Then you should run 
nodetool cleanup on the other nodes to eliminate no-longer-needed rows which 
now belong to the new node.

So, my point is that to avoid the need to bootstrap and to cleanup, it's better 
to bring all nodes up at about the same time.  If this is wrong, please explain 
why.


Thanks, Don


From: Robert Coli rc...@eventbrite.com
Sent: Wednesday, October 15, 2014 1:54 PM
To: user@cassandra.apache.org
Subject: Re: Question about adding nodes incrementally to a new datacenter: 
wait til all hosts come up so they can learn the token ranges?

On Tue, Oct 14, 2014 at 4:52 PM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
Suppose I create a new DC with 25 nodes. I have their IPs in 
cassandra-topology.properties.  Twenty-three of the nodes start up, but two of 
the nodes fail to start.   If I start replicating (via nodetool rebuild) 
without those two nodes, then when those 2 nodes enter the DC the distribution 
of tokens to vnodes will change and I'd need to rebuild or bootstrap, right?

In other words, it's better to wait til all nodes come up before we start 
replicating.  Does this sound right?

I presume that all the nodes need to come up so it can learn the token ranges.

I don't understand your question. Vnodes exist to randomly distribute data on 
each physical node into [n] virtual node chunks, 256 by default.

They do this in order to allow you to add 2 nodes to your 25 node cluster 
without rebalancing the prior 23.

The simplest way to illustrate this is to imagine a token range of 0-20 in a 4 
node cluster with RF=1.

A 0-5
B 5-10
C 10-15
D 15-20 (0)

Each node has 25% of the data. If you add a new node E, and want it to join 
with 25% of the data, there is literally nowhere you can have it join to 
accomplish this goal. You have to join it in between one of the existing nodes, 
and then move each of those nodes so that the distribution is even again. This 
is why, prior to vnodes, the best practice was to double your cluster size.

=Rob
http://twitter.com/rcolidba



stream_throughput_outbound_megabits_per_sec

2014-10-15 Thread Donald Smith

stream_throughput_outbound_megabits_per_sec  is the timeout per operation on 
the streaming socket.   The docs recommend not to have it too low (because a 
timeout causes streaming to restart from the beginning). But the default 0 
never times out.  What's a reasonable value?

Does it stream an entire SSTable in one operation? I doubt it.  How large is 
the object it streams in one operation?  I'm tempted to put the timeout at 30 
seconds or 1 minute. Is that too low?

Some of our rebuilds hang for many hours and we figure we need a timeout.

Thanks, Don


Question about adding nodes incrementally to a new datacenter: wait til all hosts come up so they can learn the token ranges?

2014-10-14 Thread Donald Smith
Suppose I create a new DC with 25 nodes. I have their IPs in 
cassandra-topology.properties.  Twenty-three of the nodes start up, but two of 
the nodes fail to start.   If I start replicating (via nodetool rebuild) 
without those two nodes, then when those 2 nodes enter the DC the distribution 
of tokens to vnodes will change and I'd need to rebuild or bootstrap, right?

In other words, it's better to wait til all nodes come up before we start 
replicating.  Does this sound right?

I presume that all the nodes need to come up so it can learn the token ranges.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



timeout for port 7000 on stateful firewall? streaming_socket_timeout_in_ms?

2014-09-29 Thread Donald Smith
We have a stateful firewallhttp://en.wikipedia.org/wiki/Stateful_firewall 
between data centers for port 7000 (inter-cluster). How long should the idle 
timeout be for the connections on the firewall?

Similarly what's appropriate for streaming_socket_timeout_in_ms in 
cassandra.yaml?  The default is 0 (no timeout).  I presume that 
streaming_socket_timeout_in_ms refers to streams such as for bootstrapping and 
rebuilding.

Thanks

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



RE: Would warnings about overlapping SStables explain high pending compactions?

2014-09-25 Thread Donald Smith
Version 2.0.9.   We have 11 ongoing compactions on that node.

From: Marcus Eriksson [mailto:krum...@gmail.com]
Sent: Thursday, September 25, 2014 12:45 AM
To: user@cassandra.apache.org
Subject: Re: Would warnings about overlapping SStables explain high pending 
compactions?

Not really

What version are you on? Do you have pending compactions and no ongoing 
compactions?

/Marcus

On Wed, Sep 24, 2014 at 11:35 PM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
On one of our nodes we have lots of pending compactions (499).In the past 
we’ve seen pending compactions go up to 2400 and all the way back down again.

Investigating, I saw warnings such as the following in the logs about 
overlapping SStables and about needing to run “nodetool scrub” on a table.  
Would the overlapping SStables explain the pending compactions?

WARN [RMI TCP Connection(2)-10.5.50.30] 2014-09-24 09:14:11,207 
LeveledManifest.java (line 154) At level 1, 
SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC-jb-388233-Data.db') 
[DecoratedKey(-6112875836465333229, 
3366636664393031646263356234663832383264616561666430383739383738), 
DecoratedKey(-4509284829153070912, 
3366336562386339376664376633353635333432636662373739626465393636)] overlaps 
SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC_blob-jb-388150-Data.db') 
[DecoratedKey(-4834684725563291584, 
336633623334363664363632666365303664333936336337343566373838), 
DecoratedKey(-4136919579566299218, 
3366613535646662343235336335633862666530316164323232643765323934)].  This could 
be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have 
dropped sstables from another node into the data directory. Sending back to L0. 
 If you didn't drop in sstables, and have not yet run scrub, you should do so 
since you may also have rows out-of-order within an sstable

Thanks

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866tel:425.201.3900%20x%203866
C: (206) 819-5965tel:%28206%29%20819-5965
F: (646) 443-2333tel:%28646%29%20443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]




Experience with multihoming cassandra?

2014-09-25 Thread Donald Smith
We have large boxes with 256G of RAM and SSDs.  From iostat, top, and sar 
we think the system has excess capacity.  Anyone have recommendations about 
multihominghttp://en.wikipedia.org/wiki/Multihoming cassandra on such a node 
(connecting it to multiple IPs and running multiple cassandras simultaneously)? 
 I'm skeptical, since Cassandra already has built-in multi-threading and 
since if the node went down multiple nodes would disappear.  We're using C* 
version 2.0.9.

A google/bing search for  multihoming cassandra doesn't turn much up.

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Adjusting readahead for SSD disk seeks

2014-09-24 Thread Donald Smith
We're using cassandra as a key-value store; our values are small.  So we're 
thinking we don't need much disk readahead (e.g., blockdev -getra /dev/sda).  
 We're using SSDs.

When cassandra does disk seeks to satisfy read requests does it typically have 
to read in the entire SStable into memory (assuming the bloom filter said yes)? 
 If cassandra needs to read in lots of blocks anyway or if it needs to read the 
entire file during compaction then I'd expect we might as well have a big 
readahead.   Perhaps there's a tradeoff between read latency and compaction 
time.

Any feedback welcome.

Thanks

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Would warnings about overlapping SStables explain high pending compactions?

2014-09-24 Thread Donald Smith
On one of our nodes we have lots of pending compactions (499).In the past 
we've seen pending compactions go up to 2400 and all the way back down again.

Investigating, I saw warnings such as the following in the logs about 
overlapping SStables and about needing to run nodetool scrub on a table.  
Would the overlapping SStables explain the pending compactions?

WARN [RMI TCP Connection(2)-10.5.50.30] 2014-09-24 09:14:11,207 
LeveledManifest.java (line 154) At level 1, 
SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC-jb-388233-Data.db') 
[DecoratedKey(-6112875836465333229, 
3366636664393031646263356234663832383264616561666430383739383738), 
DecoratedKey(-4509284829153070912, 
3366336562386339376664376633353635333432636662373739626465393636)] overlaps 
SSTableReader(path='/data/data/XYZ/ABC/XYZ-ABC_blob-jb-388150-Data.db') 
[DecoratedKey(-4834684725563291584, 
336633623334363664363632666365303664333936336337343566373838), 
DecoratedKey(-4136919579566299218, 
3366613535646662343235336335633862666530316164323232643765323934)].  This could 
be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the fact that you have 
dropped sstables from another node into the data directory. Sending back to L0. 
 If you didn't drop in sstables, and have not yet run scrub, you should do so 
since you may also have rows out-of-order within an sstable

Thanks

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Is there harm from having all the nodes in the seed list?

2014-09-23 Thread Donald Smith
Is there any harm from having all the nodes listed in the seeds list in 
cassandra.yaml?

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Is it wise to increase native_transport_max_threads if we have lots of CQL clients?

2014-09-19 Thread Donald Smith
If we have hundreds of CQL clients (for C* 2.0.9), should we increase 
native_transport_max_threads  in cassandra.yaml from the default (128)  to the 
number of clients?   If we don't do that, I presume requests will queue up, 
resulting in higher latency,  What's a reasonable max value for increase 
native_transport_max_threads?

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Trying to understand cassandra gc logs

2014-09-15 Thread Donald Smith
I understand that cassandra uses ParNew GC for New Gen and CMS for Old Gen 
(tenured).   I'm trying to interpret in the logs when a Full GC happens and 
what kind of Full GC is used.  It never says Full GC or anything like that. 
But I see that whenever there's a line like

2014-09-15T18:04:17.197-0700: 117485.192: [CMS-concurrent-mark-start]

the count of full GCs increases from

{Heap after GC invocations=158459 (full 931):

to a line like:

{Heap before GC invocations=158459 (full 932):

See the highlighted lines in the gclog output below.  So, apparently there was 
a full GC between those two lines. Between those lines it also has two lines, 
such as:

   2014-09-15T18:04:17.197-0700: 117485.192: Total time for which application 
threads were stopped: 0.0362080 seconds
   2014-09-15T18:04:17.882-0700: 117485.877: Total time for which application 
threads were stopped: 0.0129660 seconds

Also, the full count (932 above) is always exactly half the number (1864) FGC 
returned by jstat, as in

dc1-cassandra01.dc01 /var/log/cassandra sudo jstat -gcutil 28511
  S0 S1 E  O  P YGC YGCTFGCFGCT GCT
55.82   0.00  82.45  45.02  59.76 165772 5129.728  1864  320.247 5449.975

So, I am apparently correct that (full 932) is the count of Full GCs. I'm 
perplexed by the log output, though.

I also see lines mentioning concurrent mark-sweep that do not appear to 
correspond to full GCs. So, my questions are:  Is CMS used also for full GCs? 
If not, what kind of gc is done? The logs don't say.Lines saying Total 
time for which application threads were stopped appear twice per full gc; why? 
 Apparently, even our Full GCs are fast. 99% of them finish within 0.18  
seconds; 99.9% finish within 0.5 seconds (which may be too slow for some of our 
clients).

Here below is some log output, with interesting parts highlighted in grey or 
yellow.  Thanks, Don

{Heap before GC invocations=158458 (full 931):
par new generation   total 1290240K, used 1213281K [0x0005bae0, 
0x00061260, 0x00061260)
  eden space 1146880K, 100% used [0x0005bae0, 0x000600e0, 
0x000600e0)
  from space 143360K,  46% used [0x000600e0, 0x000604ed87c0, 
0x000609a0)
  to   space 143360K,   0% used [0x000609a0, 0x000609a0, 
0x00061260)
concurrent mark-sweep generation total 8003584K, used 5983572K 
[0x00061260, 0x0007fae0, 0x0007fae0)
concurrent-mark-sweep perm gen total 44820K, used 26890K [0x0007fae0, 
0x0007fd9c5000, 0x0008)
2014-09-15T18:04:17.131-0700: 117485.127: [GCBefore GC:
Statistics for BinaryTreeDictionary:

Total Free Space: 197474318
Max   Chunk Size: 160662270
Number of Blocks: 3095
Av.  Block  Size: 63804
Tree  Height: 32
Before GC:
Statistics for BinaryTreeDictionary:

Total Free Space: 2285026
Max   Chunk Size: 2279936
Number of Blocks: 8
Av.  Block  Size: 285628
Tree  Height: 5
2014-09-15T18:04:17.133-0700: 117485.128: [ParNew
Desired survivor size 73400320 bytes, new threshold 1 (max 1)
- age   1:   44548776 bytes,   44548776 total
: 1213281K-49867K(1290240K), 0.0264540 secs] 7196854K-6059170K(9293824K)After 
GC:
Statistics for BinaryTreeDictionary:

Total Free Space: 195160244
Max   Chunk Size: 160662270
Number of Blocks: 3093
Av.  Block  Size: 63097
Tree  Height: 32
After GC:
Statistics for BinaryTreeDictionary:

Total Free Space: 2285026
Max   Chunk Size: 2279936
Number of Blocks: 8
Av.  Block  Size: 285628
Tree  Height: 5
, 0.0286700 secs] [Times: user=0.37 sys=0.01, real=0.03 secs]
Heap after GC invocations=158459 (full 931):
par new generation   total 1290240K, used 49867K [0x0005bae0, 
0x00061260, 0x00061260)
  eden space 1146880K,   0% used [0x0005bae0, 0x0005bae0, 
0x000600e0)
  from space 143360K,  34% used [0x000609a0, 0x00060cab2e18, 
0x00061260)
  to   space 143360K,   0% used [0x000600e0, 0x000600e0, 
0x000609a0)
concurrent mark-sweep generation total 8003584K, used 6009302K 
[0x00061260, 0x0007fae0, 0x0007fae0)
concurrent-mark-sweep perm gen total 44820K, used 26890K [0x0007fae0, 
0x0007fd9c5000, 0x0008)
}
2014-09-15T18:04:17.161-0700: 117485.156: Total time for which application 
threads were stopped: 0.0421350 seconds
2014-09-15T18:04:17.173-0700: 117485.168: [GC [1 CMS-initial-mark: 
6009302K(8003584K)] 6059194K(9293824K), 0.0231840 secs] [Times: user=0.03 
sys=0.00, real=0.03 secs]
2014-09-15T18:04:17.197-0700: 117485.192: Total time for which application 
threads were stopped: 0.0362080 seconds
2014-09-15T18:04:17.197-0700: 117485.192: [CMS-concurrent-mark-start]
2014-09-15T18:04:17.681-0700: 117485.677: [CMS-concurrent-mark: 0.484/0.484 
secs] 

RE: How often are JMX Cassandra metrics reset?

2014-08-29 Thread Donald Smith
Thanks, Chris.

75thPercentile is clearly NOT lifetime: its value jumps around.
However, I can tell that Max is lifetime; it's been showing the exact same 
value for days, on various nodes. Hence my doubts.

From: Chris Lohfink [mailto:clohf...@blackbirdit.com]
Sent: Thursday, August 28, 2014 3:56 PM
To: user@cassandra.apache.org
Subject: Re: How often are JMX Cassandra metrics reset?

In the version of metrics used theres a uniform reservoir and a exponentially 
weighted one.  This is used to compute the min, max, mean, std dev and 
quantiles.  For the timers it uses by default it uses the exp. decaying one 
which is weighted for the last 5 minutes.

http://grepcode.com/file/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.2.0/com/yammer/metrics/core/Timer.java?av=f

Chris Lohfink


On Aug 28, 2014, at 5:39 PM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:


The metrics OneMinuteRate, FIveMinuteRate, FifteenMinuteRate, and MeanRate  are 
NOT lifetime values but they're all counts of requests, not latency.  The 
latency values (Max, Count, 50thPercentile, Mean, etc) ARE lifetime values, I 
think, and thus would seem to be kinda useless for me, since our servers have 
been running for months.

Maybe there's a way to reset lifetime metrics to zero. I connected to a 
cassandra server remotely via jConsole (port 7199) and I can read various 
metrics via the MBeans, but I don't see an operation for resetting to zero.   
But perhaps that's because I'm connecting remotely.

ClientRequest/Read/Latency:
LatencyUnit = MICROSECONDS
FiveMinuteRate = 1.12
FifteenMinuteRate = 1.11
RateUnit = SECONDS
MeanRate = 1.65
OneMinuteRate = 1.13
EventType = calls
   Max = 237,373.37
Count = 961,312
50thPercentile = 383.2
Mean = 908.46
Min = 95.64
StdDev = 3,034.62
75thPercentile = 626.34
95thPercentile = 954.31
98thPercentile = 1,443.11
99thPercentile = 1,472.4
999thPercentile = 1,858.1


From: Nick Bailey [mailto:n...@datastax.com]
Sent: Thursday, August 28, 2014 1:50 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: How often are JMX Cassandra metrics reset?

Those percentile values should be for the lifetime of the node yes. Depending 
on what version of OpsCenter you are using it is either using the 'recent' 
metrics described by Rob, or it is using the FiveMinuteRate from JMX as well as 
doing some of it's own aggregation depending on the rollup size.

On Thu, Aug 28, 2014 at 12:36 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:
On Thu, Aug 28, 2014 at 9:27 AM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
And yet OpsCenter shows graphs with ever-changing metrics that show recent 
performance. Does OpsCenter not get its stats from JMX?

1) Certain JMX endpoints expose recent metrics, or at least used to. These 
are recent as in since the last time someone polled this endpoint.
2) OpsCenter samples via JMX and then stores metrics in its own columnfamily. I 
would not be shocked if it does some minor aggregation as it does so.

This all said, OpsCenter is not Apache Cassandra software, so the Apache 
Cassandra user mailing list may not be the ideal place for it to be discussed 
or supported...

=Rob



Rebuilding a cassandra seed node with the same tokens and same IP address

2014-08-29 Thread Donald Smith
One of our nodes is getting an increasing number of pending compactions due, we 
think, to

https://issues.apache.org/jira/browse/CASSANDRA-7145 , which is fixed in future 
version 2.0.11 .   (We had the same error a month ago, but at that time we were 
in pre-production and could just clean the disks on all the nodes and restart. 
Now we want to be cleverer.)


To overcome the issue we figure we should just rebuild the node using the same 
token range, to avoid unneeded data reshuffling.  So we figure we should  (1) 
find the tokens in use on that node via nodetool ring, (2) stop cassandra on 
that node, (3) delete the data directory, (4) Use the tokens saved in step (1) 
as the initial_token list, and (5) restart the node.


But the node is a seed node and cassandra won't bootstrap seed nodes. Perhaps 
removing that node's address from the seeds list on the other nodes (and on 
that node) will be sufficient. That's what Replacing a Dead Seed 
Nodehttp://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_seed_node.html
 suggests. Perhaps I can remove the ip address from the seeds list on all nodes 
in the cluster, restart all the nodes, and then restart the bad node with 
auto_bootstrap=true.


I want to use the same IP address. and so I don't think I can follow the 
instructions at

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html,
 because it assumes the IP address of the dead node and the new node differ.


If I just start it up  it will start serving traffic and read requests will 
fail. It wouldn't be the end of the world (the production use isn't critical 
yet).


Should we use nodetool rebuild $LOCAL_DC?  (though I think that's mostly for 
adding a data center) Should I add it back in and do nodetool repair? I'm 
afraid that would be too slow.


Again, don't want to REMOVE the node from the cluster: that would cause 
reshuffling of token ranges and data. I want to use the same token range.


Any suggestions?


Thanks, Don


RE: How often are JMX Cassandra metrics reset?

2014-08-28 Thread Donald Smith
And yet OpsCenter shows graphs with ever-changing metrics that show recent 
performance. Does OpsCenter not get its stats from JMX?

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Wednesday, August 27, 2014 12:56 PM
To: user@cassandra.apache.org
Subject: Re: How often are JMX Cassandra metrics reset?

On Wed, Aug 27, 2014 at 12:38 PM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
I’m using JMX to retrieve Cassandra metrics.   I notice that  Max and Count are 
cumulative and aren’t reset.How often are the stats for Mean, 
99tthPercentile, etc reset back to zero?

If they're like the old latency numbers, they are from node startup time and 
are never reset.

=Rob


RE: How often are JMX Cassandra metrics reset?

2014-08-28 Thread Donald Smith
The metrics OneMinuteRate, FIveMinuteRate, FifteenMinuteRate, and MeanRate  are 
NOT lifetime values but they’re all counts of requests, not latency.  The 
latency values (Max, Count, 50thPercentile, Mean, etc) ARE lifetime values, I 
think, and thus would seem to be kinda useless for me, since our servers have 
been running for months.

Maybe there’s a way to reset lifetime metrics to zero. I connected to a 
cassandra server remotely via jConsole (port 7199) and I can read various 
metrics via the MBeans, but I don’t see an operation for resetting to zero.   
But perhaps that’s because I’m connecting remotely.

ClientRequest/Read/Latency:
LatencyUnit = MICROSECONDS
FiveMinuteRate = 1.12
FifteenMinuteRate = 1.11
RateUnit = SECONDS
MeanRate = 1.65
OneMinuteRate = 1.13
EventType = calls
   Max = 237,373.37
Count = 961,312
50thPercentile = 383.2
Mean = 908.46
Min = 95.64
StdDev = 3,034.62
75thPercentile = 626.34
95thPercentile = 954.31
98thPercentile = 1,443.11
99thPercentile = 1,472.4
999thPercentile = 1,858.1


From: Nick Bailey [mailto:n...@datastax.com]
Sent: Thursday, August 28, 2014 1:50 PM
To: user@cassandra.apache.org
Subject: Re: How often are JMX Cassandra metrics reset?

Those percentile values should be for the lifetime of the node yes. Depending 
on what version of OpsCenter you are using it is either using the 'recent' 
metrics described by Rob, or it is using the FiveMinuteRate from JMX as well as 
doing some of it's own aggregation depending on the rollup size.

On Thu, Aug 28, 2014 at 12:36 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:
On Thu, Aug 28, 2014 at 9:27 AM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
And yet OpsCenter shows graphs with ever-changing metrics that show recent 
performance. Does OpsCenter not get its stats from JMX?

1) Certain JMX endpoints expose recent metrics, or at least used to. These 
are recent as in since the last time someone polled this endpoint.
2) OpsCenter samples via JMX and then stores metrics in its own columnfamily. I 
would not be shocked if it does some minor aggregation as it does so.

This all said, OpsCenter is not Apache Cassandra software, so the Apache 
Cassandra user mailing list may not be the ideal place for it to be discussed 
or supported...

=Rob




How often are JMX Cassandra metrics reset?

2014-08-27 Thread Donald Smith
I'm using JMX to retrieve Cassandra metrics.   I notice that  Max and Count are 
cumulative and aren't reset.How often are the stats for Mean, 
99tthPercentile, etc reset back to zero?

For example, 99thPercentile shows as 1.5 mls. Over how many minutes?

ClientRequest/Read/Latency:
LatencyUnit = MICROSECONDS
FiveMinuteRate = 1.12
FifteenMinuteRate = 1.11
RateUnit = SECONDS
MeanRate = 1.65
OneMinuteRate = 1.13
EventType = calls
   Max = 237,373.37
Count = 961,312
50thPercentile = 383.2
Mean = 908.46
Min = 95.64
StdDev = 3,034.62
75thPercentile = 626.34
95thPercentile = 954.31
98thPercentile = 1,443.11
99thPercentile = 1,472.4
999thPercentile = 1,858.1

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



RE: adding more nodes into the cluster

2014-08-01 Thread Donald Smith
According to datastax’s documentation at 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
   “By default, this setting [auto_bootstrap] is true and not listed in the 
cassandra.yaml file.”

But http://wiki.apache.org/cassandra/StorageConfiguration  says:
   “Default is: 'false', so that new clusters don't bootstrap immediately. You 
should turn this on when you start adding new nodes to a cluster that already 
has data on it.”

So which is correct?

Also, the two pages disagree on the instructions on how to add new nodes to an 
existing cluster.  The first page says to set auto_boostrap to ‘false’ when 
adding a new data center to a cluster. “Setting this parameter to false 
prevents the new nodes from attempting to get all the data from the other nodes 
in the data center. When you run nodetool 
rebuildhttp://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsRebuild.html
 in the last step, each node is properly mapped.”

The second page suggests setting auto_boostrap to ‘true’ when you add new nodes 
to an existing cluster: “You should turn this on when you start adding new 
nodes to a cluster that already has data on it.”  Perhaps that applies only to 
new nodes to an existing data center (not a new data center to an existing 
cluster).

So, I’m not clear what I should do.   I want to add a data center to an 
existing cluster.   If I set auto_bootstrap to true in the new nodes of the new 
cluster, will it stream data from the other data centers?  Perhaps it will 
stream only NEW rows.   Perhaps the purpose of doing “nodetool rebuild” is to 
force streaming OLD data (like a repair).  It’s not clear. Maybe 
auto_bootstrap=true is equivalent to (auto_boostrap=false plus “nodetool 
rebuild”).

Thoughts?

Don
Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]


From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Wednesday, July 16, 2014 12:31 PM
To: user@cassandra.apache.org
Subject: Re: adding more nodes into the cluster

On Wed, Jul 16, 2014 at 12:28 PM, Robert Coli 
rc...@eventbrite.commailto:rc...@eventbrite.com wrote:
It applies whenever one is bootstrapping a node. One is bootstrapping a node 
whenever one starts a node with auto_bootstrap set to true (the default) and 
with either one-or-more tokens in initial_token or num_tokens set.

Ugh sorry :

1) starting a node
2) with auto_bootstrap:true (default)
3) initial_token or num_tokens populated
4) node has never successfully bootstrapped before, and has not therefore 
written the information of its successful bootstrap to the system keyspace

If the node has bootstrapped before, it will not do so again unless 
replace_address is used.

=Rob




Problem with /etc/cassandra for cassandra 2.0.8

2014-06-17 Thread Donald Smith
I installed a package version of cassandra via sudo yum install 
cassandra20.noarch into a clean host and got:

cassandra20.noarch  2.0.8-2 @datastax

That resulted in a problem:  /etc/cassandra/ did not exist.  So I did sudo yum 
downgrade cassandra20.noarch and got version 2.0.7. That fixed the problem: 
/etc/cassandra appeared.

Anyone else have a problem with version 2.0.8?  I don't see any release note 
suggesting they moved that directory.


Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



RE: Cassandra data retention policy

2014-04-28 Thread Donald Smith
CQL lets you specify a default TTL per column family/table:  and 
default_time_to_live=86400 .

From: Redmumba [mailto:redmu...@gmail.com]
Sent: Monday, April 28, 2014 12:51 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra data retention policy

Have you looked into using a TTL?  You can set this per insert (unfortunately, 
it can't be set per CF) and values will be tombstoned after that amount of 
time.  I.e.,

INSERT INTO  VALUES ... TTL 15552000
Keep in mind, after the values have expired, they will essentially become 
tombstones--so you will still need to run clean-ups (probably daily) to clear 
up space.

Does this help?
One caveat is that this is difficult to apply to existing rows--i.e., you can't 
bulk-update a bunch of rows with this data.  As such, another good suggestion 
is to simply have a secondary index on a date field of some kind, and run a 
bulk remove (and subsequent clean-up) daily/weekly/whatever.

On Mon, Apr 28, 2014 at 11:31 AM, Han Jia 
johnideal...@gmail.commailto:johnideal...@gmail.com wrote:
Hi guys,


We have a processing system that just uses the data for the past six months in 
Cassandra. Any suggestions on the best way to manage the old data in order to 
save disk space? We want to keep it as backup but it will not be used unless we 
need to do recovery. Thanks in advance!


-John



Logs of commitlog files

2014-04-14 Thread Donald Smith
1. With cassandra 2.0.6, we have 547G of files in /var/lib/commitlog/.  I 
started a nodetool flush 65 minutes ago; it's still running.  The 17536 
commitlog files have been created in the last 3 days.  (The node has 2.1T of 
sstables data in /var/lib/cassandra/data/.  This is in staging, not prod.) Why 
so many commit logs?  Here are our commitlog-related settings in cassandra.yaml:

commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
# The size of the individual commitlog file segments.  A commitlog
# archiving commitlog segments (see commitlog_archiving.properties),
commitlog_segment_size_in_mb: 32
# Total space to use for commitlogs.  Since commitlog segments are
# segment and remove it.  So a small total commitlog space will tend
# commitlog_total_space_in_mb: 4096

Maybe we should set commitlog_total_space_in_mb to something other than the 
default. According to OpsCenter, commitlog_total_space_in_mb is None.But 
it seems odd that there'd be so many commit logs.

The node is under heavy write load.   There are about 2900 compactions pending.

We are NOT archiving commitlogs, via commitlog_archiving.properties.

BTW, the documentation for nodetoolhttp://wiki.apache.org/cassandra/NodeTool 
says:
Flush

Flushes memtables (in memory) to SSTables (on disk), which also enables 
CommitLoghttp://wiki.apache.org/cassandra/CommitLog segments to be deleted.
But even after doing a flush, the /var/lib/commitlog dir still has 1G of files, 
even after waiting 30  minutes.  Each file is 32M in size, plus or minus a few 
bytes.  I tried this on other clusters, with much smaller amounts of data.   
Even restarting Cassandra doesn't help.

I surmise that the 1GB of commit logs are normal: they probably allocate that 
space as a workspace.


Thanks,  Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

RE: Lots of commitlog files

2014-04-14 Thread Donald Smith
Another thing.   cassandra.yaml says:

# Total space to use for commitlogs.  Since commitlog segments are
# mmapped, and hence use up address space, the default size is 32
# on 32-bit JVMs, and 1024 on 64-bit JVMs.
#
# If space gets above this value (it will round up to the next nearest
# segment multiple), Cassandra will flush every dirty CF in the oldest
# segment and remove it.  So a small total commitlog space will tend
# to cause more flush activity on less-active columnfamilies.
# commitlog_total_space_in_mb: 4096

We're using a 64 bit linux with a 64 bit JVM:

Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)

but our commit log files are each 32MB in size. Is this indicative of a bug?  
Shouldn't they be 1024MB in size?

  Don

From: Donald Smith
Sent: Monday, April 14, 2014 12:04 PM
To: 'user@cassandra.apache.org'
Subject: Logs of commitlog files

1. With cassandra 2.0.6, we have 547G of files in /var/lib/commitlog/.  I 
started a nodetool flush 65 minutes ago; it's still running.  The 17536 
commitlog files have been created in the last 3 days.  (The node has 2.1T of 
sstables data in /var/lib/cassandra/data/.  This is in staging, not prod.) Why 
so many commit logs?  Here are our commitlog-related settings in cassandra.yaml:
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
# The size of the individual commitlog file segments.  A commitlog
# archiving commitlog segments (see commitlog_archiving.properties),
commitlog_segment_size_in_mb: 32
# Total space to use for commitlogs.  Since commitlog segments are
# segment and remove it.  So a small total commitlog space will tend
# commitlog_total_space_in_mb: 4096

Maybe we should set commitlog_total_space_in_mb to something other than the 
default. According to OpsCenter, commitlog_total_space_in_mb is None.But 
it seems odd that there'd be so many commit logs.

The node is under heavy write load.   There are about 2900 compactions pending.

We are NOT archiving commitlogs, via commitlog_archiving.properties.

BTW, the documentation for nodetoolhttp://wiki.apache.org/cassandra/NodeTool 
says:
Flush

Flushes memtables (in memory) to SSTables (on disk), which also enables 
CommitLoghttp://wiki.apache.org/cassandra/CommitLog segments to be deleted.
But even after doing a flush, the /var/lib/commitlog dir still has 1G of files, 
even after waiting 30  minutes.  Each file is 32M in size, plus or minus a few 
bytes.  I tried this on other clusters, with much smaller amounts of data.   
Even restarting Cassandra doesn't help.

I surmise that the 1GB of commit logs are normal: they probably allocate that 
space as a workspace.


Thanks,  Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

Setting gc_grace_seconds to zero and skipping nodetool repair (was RE: Timeseries with TTL)

2014-04-07 Thread Donald Smith
This statement is significant: “BTW if you never delete and only ttl your 
values at a constant value, you can set gc=0 and forget about periodic repair 
of the table, saving some space, IO, CPU, and an operational step.”

Setting gc_grace_seconds to zero has the effect of not storing hinted handoffs 
(which prevent deleted data from reappearing), I believe.   “Periodic repair” 
refers to running “nodetool repair” (aka Anti-Entropy).

I too have wondered if setting gc_grace_seconds to zero and skipping “nodetool 
repair” are safe.

We’re using C* 2.0.6. In the 2.0.X versions, with vnodes, “nodetool repair …” 
is very slow (see https://issues.apache.org/jira/browse/CASSANDRA-5220 and 
https://issues.apache.org/jira/browse/CASSANDRA-6611).We found read repairs 
via “nodetool repair” unacceptably slow, even when we restricted it to one 
table, and often the repairs hung or failed.  We also tried subrange repairs 
and the other options.

Our app does no deletes and only rarely updates a row (if there was bad data 
that needs to be replaced).  So it’s very tempting to set gc_grace_seconds = 0 
in the table definitions and skip read repairs.

But there is Cassandra documentation that warns that read repairs are necessary 
even if you don’t do deletes. For example, 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html
 says:

 Note: If deletions never occur, you should still schedule regular repairs. 
Be aware that setting a column to null is a delete.

The apache wiki  
https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair says:

Unless your application performs no deletes, it is strongly recommended that 
production clusters run nodetool repair periodically on all nodes in the 
cluster.

*IF* your operations team is sufficiently on the ball, you can get by without 
repair as long as you do not have hardware failure -- in that case, 
HintedHandoffhttps://wiki.apache.org/cassandra/HintedHandoff is adequate to 
repair successful updates that some replicas have missed. Hinted handoff is 
active for max_hint_window_in_ms after a replica fails.

Full repair or re-bootstrap is necessary to re-replicate data lost to hardware 
failure (see below).
So, if there are hardware failures, “nodetool repair” is needed.  And 
http://planetcassandra.org/general-faq/ says:

Anti-Entropy Node Repair – For data that is not read frequently, or to update 
data on a node that has been down for an extended period, the node repair 
process (also referred to as anti-entropy repair) ensures that all data on a 
replica is made consistent. Node repair (using the nodetool utility) should be 
run routinely as part of regular cluster maintenance operations.

If RF=2, ReadConsistency is ONE and data failed to get replicated to the second 
node, then during a read might the app incorrectly return “missing data”?

It seems to me that the need to run “nodetool repair” reflects a design bug; it 
should be automated.

Don

From: Laing, Michael [mailto:michael.la...@nytimes.com]
Sent: Sunday, April 06, 2014 11:31 AM
To: user@cassandra.apache.org
Subject: Re: Timeseries with TTL

Since you are using LeveledCompactionStrategy there is no major/minor 
compaction - just compaction.

Leveled compaction does more work - your logs don't look unreasonable to me - 
the real question is whether your nodes can keep up w the IO. SSDs work best.

BTW if you never delete and only ttl your values at a constant value, you can 
set gc=0 and forget about periodic repair of the table, saving some space, IO, 
CPU, and an operational step.

If your nodes cannot keep up the IO, switch to SizeTieredCompaction and monitor 
read response times. Or add SSDs.

In my experience, for smallish nodes running C* 2 without SSDs, 
LeveledCompactionStrategy can cause the disk cache to churn, reducing read 
performance substantially. So watch out for that.

Good luck,

Michael

On Sun, Apr 6, 2014 at 10:25 AM, Vicent Llongo 
villo...@gmail.commailto:villo...@gmail.com wrote:
Hi,

Most of the queries to that table are just getting a range of values for a 
metric:
SELECT val FROM metrics_5min WHERE uid = ? AND metric = ? AND ts = ? AND ts = 
?

I'm not sure from the logs what kind of compactions they are. This is what I 
see in system.log (grepping for that specific table):

...
INFO [CompactionExecutor:742] 2014-04-06 13:30:11,223 CompactionTask.java (line 
105) Compacting 
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14991-Data.db'),
 
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14990-Data.db')]
INFO [CompactionExecutor:753] 2014-04-06 13:35:22,495 CompactionTask.java (line 
105) Compacting 
[SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14992-Data.db'),
 
SSTableReader(path='/mnt/disk1/cassandra/data/keyspace/metrics_5min/keyspace-metrics_5min-ic-14993-Data.db')]
INFO 

Question about rpms from datastax

2014-03-27 Thread Donald Smith
On http://rpm.riptano.com/community/noarch/ what's the difference between

cassandra20-2.0.6-1.noarch.rpmhttp://rpm.riptano.com/community/noarch/cassandra20-2.0.6-1.noarch.rpm
  and  
dsc20-2.0.6-1.noarch.rpmhttp://rpm.riptano.com/community/noarch/dsc20-2.0.6-1.noarch.rpm
 ?

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

nodetool scrub throws exception FileAlreadyExistsException

2014-03-26 Thread Donald Smith

% time nodetool scrub -s as_reports data_report_info_2011
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms8192M -Xmx8192M 
-Xmn2048M -XX:+HeapDumpOnOutOfMemoryError -Xss256k
Exception in thread main FSWriteError in 
/mnt/cassandra-storage/data/as_reports/data_report_info_2011/snapshots/pre-scrub-1395848747073/as_reports-data_report_info_2011-jb-3-Data.db
at 
org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:84)
at 
org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1817)
at 
org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1123)
at 
org.apache.cassandra.service.StorageService.scrub(StorageService.java:2197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.reflect.misc.Trampoline.invoke(Unknown Source)
at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.reflect.misc.MethodUtil.invoke(Unknown Source)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
Source)
at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown 
Source)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown 
Source)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown 
Source)
at javax.management.remote.rmi.RMIConnectionImpl.access$300(Unknown 
Source)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown 
Source)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown 
Source)
at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown Source)
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at sun.rmi.transport.Transport$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown 
Source)
at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.file.FileAlreadyExistsException: 
/mnt/cassandra-storage/data/as_reports/data_report_info_2011/snapshots/pre-scrub-1395848747073/as_reports-data_report_info_2011-jb-3-Data.db
 - 
/mnt/cassandra-storage/data/as_reports/data_report_info_2011/as_reports-data_report_info_2011-jb-3-Data.db
at sun.nio.fs.UnixException.translateToIOException(Unknown Source)
at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.UnixFileSystemProvider.createLink(Unknown Source)
at java.nio.file.Files.createLink(Unknown Source)
at 
org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:80)
... 39 more
1.112u 0.122s 3:38.36 0.5%  0+0k 0+328io 0pf+0w


That table is new and very unlikely to be corrupted.  I retried the command 
without -s and it succeeded right away. I tried again WITH -s and it 
succeeded again too.

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

Question about how compaction and partition keys interact

2014-03-26 Thread Donald Smith
In CQL we need to decide between using ((customer_id,type),date) as the CQL 
primary key for a reporting table, versus ((customer_id,date),type).

We store reports for every day.  If we use (customer_id,type) as the partition 
key (physical key), then we have  a WIDE ROW where each date's data is stored 
in a different column. Over time, as new reports are added for different dates, 
the row will get wider and wider, and I thought that might cause more work for 
compaction.

So, would a partition key of (customer_id,date) yield better compaction 
behavior?

Again, if we use (customer_id,type) as the partition key, then over time, as 
new columns are added to that row for different dates, I'd think that 
compaction would have to merge new data for a given physical row from multiple 
sstables. That would make compaction expensive.  But if we use 
(customer_id,date) as the partition key, then new data will be added to new 
physical rows, and so compaction would have less work to do

My question is really about how compaction interacts with partition keys.  
Someone on the Cassandra irc channel, 
http://webchat.freenode.net/?channels=#cassandra, said that when partition keys 
overlap between sstables, there's only slightly more work to do than when 
they don't, for merging sstables in compaction.  So he thought the first form,  
((customer_id,type),date),  would be better.

One advantage of the first form, ((customer_id,type),date) ,  is that we can 
get all report data for all dates for a given customer and type in a single 
wide row  -- and we do have a (uncommon) use case for such reports.

If we used a primary key of ((customer_id,type,date)), then the rows would be 
un-wide; that wouldn't take advantage of clustering columns and (like the 
second form) wouldn't support the (uncommon) use case mentioned in the previous 
paragraph.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

RE: memory usage spikes

2014-03-26 Thread Donald Smith
Prem,

Did you follow the instructions at 
http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html?scroll=reference_ds_sxl_gf3_2k

And did you install jna-3.2.7.jar into /usr/share/java, as per 
http://www.datastax.com/documentation/cassandra/2.0/mobile/cassandra/install/installJnaRHEL.html
 ?

Don

From: prem yadav [mailto:ipremya...@gmail.com]
Sent: Wednesday, March 26, 2014 10:36 AM
To: user@cassandra.apache.org
Subject: Re: memory usage spikes

here:

ps -p `/usr/java/jdk1.6.0_37/bin/jps | awk '/Dse/ {print $1}'` uww

SER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
497  20450  0.9 31.0 4727620 2502644 ? SLl  06:55   3:28 
/usr/java/jdk1.6.0_37//bin/java -ea 
-javaagent:/usr/share/dse/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -Xms1968M -Xmx1968M -Xmn400M 
-XX:+HeapDumpOnOutOfMemoryError -Xss190k -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true 
-Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true 
-Dcassandra-pidfile=/var/run/dse.pid -cp 
:/usr/share/dse/dse.jar:/usr/share/dse/common/commons-codec-1.6.jar:/usr/share/dse/common/commons-io-2.4.jar:/usr/share/dse/common/guava-13.0.jar:/usr/share/dse/common/jbcrypt-0.3m.jar:/usr/share/dse/common/log4j-1.2.16.jar:/usr/share/dse/common/slf4j-api-1.6.1.jar:/usr/share/dse/common/slf4j-log4j12-1.6.1.jar:/etc/dse:/usr/share/java/jna.jar:/etc/dse/cassandra:/usr/share/dse/cassandra/tools/lib/stress.jar:/usr/share/dse/cassandra/lib/antlr-2.7.7.jar:/usr/share/dse/cassandra/lib/antlr-3.2.jar:/usr/share/dse/cassandra/lib/antlr-runtime-3.2.jar:/usr/share/dse/cassandra/lib/avro-1.4.0-cassandra-1.jar:/usr/share/dse/cassandra/lib/cassandra-all-1.1.9.10.jar:/usr/share/dse/cassandra/lib/cassandra-clientutil-1.1.9.10.jar:/usr/share/dse/cassandra/lib/cassandra-thrift-1.1.9.10.jar:/usr/share/dse/cassandra/lib/commons-cli-1.1.jar:/usr/share/dse/cassandra/lib/commons-codec-1.6.jar:/usr/share/dse/cassandra/lib/commons-lang-2.4.jar:/usr/share/dse/cassandra/lib/commons-logging-1.1.1.jar:/usr/share/dse/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/dse/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/dse/cassandra/lib/guava-13.0.jar:/usr/share/dse/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/dse/cassandra/lib/httpclient-4.0.1.jar:/usr/share/dse/cassandra/lib/httpcore-4.0.1.jar:/usr/share/dse/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/dse/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/dse/cassandra/lib/jamm-0.2.5.jar:/usr/share/dse/cassandra/lib/jline-0.9.94.jar:/usr/share/dse/cassandra/lib/joda-time-1.6.2.jar:/usr/share/dse/cassandra/lib/json-simple-1.1.jar:/usr/share/dse/cassandra/lib/libthrift-0.7.0.jar:/usr/share/dse/cassandra/lib/log4j-1.2.16.jar:/usr/share/dse/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/dse/cassandra/lib/servlet-api-2.5.jar:/usr/share/dse/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/dse/cassandra/lib/snakeyaml-1.6.jar:/usr/share/dse/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/dse/cassandra/lib/snaptree-0.1.jar:/usr/share/dse/cassandra/lib/stringtemplate-3.2.jar::/usr/share/dse/solr/lib/solr-4.0.2.4-SNAPSHOT-uber.jar:/usr/share/dse/solr/lib/solr-web-4.0.2.4-SNAPSHOT.jar:/usr/share/dse/solr/conf::/usr/share/dse/tomcat/lib/annotations-api-6.0.32.jar:/usr/share/dse/tomcat/lib/catalina-6.0.32.jar:/usr/share/dse/tomcat/lib/catalina-ha-6.0.32.jar:/usr/share/dse/tomcat/lib/coyote-6.0.32.jar:/usr/share/dse/tomcat/lib/el-api-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-el-6.0.29.jar:/usr/share/dse/tomcat/lib/jasper-jdt-6.0.29.jar:/usr/share/dse/tomcat/lib/jsp-api-6.0.29.jar:/usr/share/dse/tomcat/lib/juli-6.0.32.jar:/usr/share/dse/tomcat/lib/servlet-api-6.0.29.jar:/usr/share/dse/tomcat/lib/tribes-6.0.32.jar:/usr/share/dse/tomcat/conf::/usr/share/dse/hadoop:/etc/dse/hadoop:/usr/share/dse/hadoop/lib/ant-1.6.5.jar:/usr/share/dse/hadoop/lib/automaton-1.11-8.jar:/usr/share/dse/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/share/dse/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/share/dse/hadoop/lib/commons-cli-1.2.jar:/usr/share/dse/hadoop/lib/commons-codec-1.4.jar:/usr/share/dse/hadoop/lib/commons-collections-3.2.1.jar:/usr/share/dse/hadoop/lib/commons-configuration-1.6.jar:/usr/share/dse/hadoop/lib/commons-digester-1.8.jar:/usr/share/dse/hadoop/lib/commons-el-1.0.jar:/usr/share/dse/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/share/dse/hadoop/lib/commons-lang-2.4.jar:/u

Its the spike in RAM usage. Now it is normal but keeps showing the spikes.

On Wed, Mar 26, 2014 at 5:31 PM, Marcin Cabaj 

RE: Question about how compaction and partition keys interact

2014-03-26 Thread Donald Smith
My underlying question is about the effects of the partitioning key on 
compaction.   Specifically, would having date as part of the partitioning key 
make compaction easier (because compaction wouldn't have to merge wide rows 
over multiple days)?   According to the person on irc, it wouldn't make much 
difference.

We care mostly about read times. If read times were all we cared about, we'd 
use a CQL primary key  of ((customer_id,type) date), especially since it lets 
us efficiently iterate over all dates for a given customer and type.  I also 
care about compaction time, and if the other primary key form decreased 
compaction time, I might go for it. We have terabytes of data.

I don't think we ever have to query all types for a given customer or date.  
That is, we are always given a specific customer and type, plus usually but not 
always a date.

Thanks, Don

From: Jonathan Lacefield [mailto:jlacefi...@datastax.com]
Sent: Wednesday, March 26, 2014 11:20 AM
To: user@cassandra.apache.org
Subject: Re: Question about how compaction and partition keys interact

Don,

  What is the underlying question?  Are trying to figure out what's going to be 
faster for reads or are you really concerned about storage?

  The recommendation typically provided is to suggest that tables are modeled 
based on query access, to enable the fastest read performance.

  In your example, will your app's queries look for
  1)  customer interactions by type by day, with the ability to
   - sort by day within a type
   - grab ranges of dates for at type quickly
   - or pull all dates (and cell data) for a type
   or
 2)  customer interactions by date by type, with the ability to
   - sort by type within a date
   - grab ranges of types for a date quickly
   - or pull all types data for a date

  We also typically recommend that partitions stay within ~100k of columns or 
~100MB per partition.  With your first scenario, wide row, you wouldn't hit the 
number of columns for ~273 years :)

  What's interesting in your modeling scenario is that, with the current 
options, you don't have the ability to easily pull all dates for a customer 
without specifying the type, specific dates, or using ALLOW FILTERING.  Did you 
ever consider partitioning simply on customer and using date and type as 
clustering keys?

  Hope that helps.

Jonathan




Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
[Image removed by sender.]http://www.linkedin.com/in/jlacefield


[Image removed by 
sender.]http://www.datastax.com/what-we-offer/products-services/training/virtual-training

On Wed, Mar 26, 2014 at 1:22 PM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
In CQL we need to decide between using ((customer_id,type),date) as the CQL 
primary key for a reporting table, versus ((customer_id,date),type).

We store reports for every day.  If we use (customer_id,type) as the partition 
key (physical key), then we have  a WIDE ROW where each date's data is stored 
in a different column. Over time, as new reports are added for different dates, 
the row will get wider and wider, and I thought that might cause more work for 
compaction.

So, would a partition key of (customer_id,date) yield better compaction 
behavior?

Again, if we use (customer_id,type) as the partition key, then over time, as 
new columns are added to that row for different dates, I'd think that 
compaction would have to merge new data for a given physical row from multiple 
sstables. That would make compaction expensive.  But if we use 
(customer_id,date) as the partition key, then new data will be added to new 
physical rows, and so compaction would have less work to do

My question is really about how compaction interacts with partition keys.  
Someone on the Cassandra irc channel, 
http://webchat.freenode.net/?channels=#cassandra, said that when partition keys 
overlap between sstables, there's only slightly more work to do than when 
they don't, for merging sstables in compaction.  So he thought the first form,  
((customer_id,type),date),  would be better.

One advantage of the first form, ((customer_id,type),date) ,  is that we can 
get all report data for all dates for a given customer and type in a single 
wide row  -- and we do have a (uncommon) use case for such reports.

If we used a primary key of ((customer_id,type,date)), then the rows would be 
un-wide; that wouldn't take advantage of clustering columns and (like the 
second form) wouldn't support the (uncommon) use case mentioned in the previous 
paragraph.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866tel:425.201.3900%20x%203866
C: (206) 819-5965tel:%28206%29%20819-5965
F: (646) 443-2333tel:%28646%29%20443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]


inline: ~WRD000.jpginline: image001.jpg

Speed of sstableloader

2014-03-11 Thread Donald Smith
I tested bulk loading in cassandra with CQLSSTableWriter and sstableloader.

It turns out that writing 1 millions rows with sstableloader took over twice as 
long as inserting regularly with batch CQL statements from Java 
(cassandra-driver-core,   version 2.0.0). Specifically, the call to 
sstableloader shown below took just over 12 minutes, while inserting with Java 
batch statements took just over 5 minutes.

I checked this twice and the same thing happened both times.

Is this expected?

Thanks, Don

Here's the code (slightly edited and abbreviated):

import org.apache.cassandra.exceptions.InvalidRequestException;
import org.apache.cassandra.io.sstable.CQLSSTableWriter;

import java.io.IOException;
import java.net.URL;
import java.net.URLClassLoader;
import java.util.Random;

// sstableloader -v -d 10.12.2.91,10.12.2.92,10.12.2.93 /tmp/test/test_table

public class CreateLoadableSSTableCQL {
..
// -
private static void create(int count) throws IOException, 
InvalidRequestException {
String schema = CREATE TABLE test.test_table (id text PRIMARY 
KEY,  value text);
String insert = INSERT INTO test.test_table (id, value) VALUES 
(?, ?);

CQLSSTableWriter writer = 
CQLSSTableWriter.builder().inDirectory(/tmp/test/test_table).forTable(schema)
.using(insert).build();
for(int i=0;icount;i++) {

writer.addRow(makeRandomString(32),makeRandomString(100));
}
writer.close();
}
// --
public static void main(String [] args) {
int count=100; //12.1 minutes using sstableloader on qa.  
5.1 minutes using regular batched inserts
if (args.length0) {
count=Integer.parseInt(args[0]);
}
try {
create(count);
} catch (InvalidRequestException e) {
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}




RE: Cassandra DSC 2.0.5 not starting - * could not access pidfile for Cassandra

2014-03-10 Thread Donald Smith
You may need to do chown -R cassandra /var/lib/cassandra /var/log/cassandra .

Don

From: user 01 [mailto:user...@gmail.com]
Sent: Monday, March 10, 2014 10:23 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra DSC 2.0.5 not starting - * could not access pidfile for 
Cassandra

$ sudo su - cassandra

I don't know why but this isn't actually working. It does not switch me to 
cassandra user[btw .. should this actually switch me to cassandra user?? ]. 
This user switching on my servers does not work for users like tomcat7 user, 
cassandra user but works for users that were manually created by user. Actually 
I tested this on two of my test servers but same results on both.


RE: replication_factor: ?

2014-03-07 Thread Donald Smith
Robert, please elaborate why you say To make best use of Cassandra, my minimum 
recommendation is usually RF=3, N=6.

I surmise that with any less than 6 nodes, you'd likely perform better with a 
sequential/single-node solution.  You need at least six nodes to overcome the 
overheads from concurrency.  But that's a vague explanation.

Thanks, Don

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: Friday, March 07, 2014 11:36 AM
To: user@cassandra.apache.org
Subject: Re: replication_factor: ?

On Fri, Mar 7, 2014 at 7:26 AM, Daniel Curry 
daniel.cu...@arrayent.commailto:daniel.cu...@arrayent.com wrote:
  I would like to know on what is the rule of thumb for replication_factor: 
number?
I think the answer is depends on how many nodes one has? IE: three nodes will 
be the
number 3.  What would happen it I put the number 2 for a three node cluster?

To make best use of Cassandra, my minimum recommendation is usually RF=3, N=6. 
There are certainly valid use cases with lower RF or N but from what I can tell 
they are an order of magnitude less common.

=Rob



RE: Supported Cassandra version for CentOS 5.5

2014-02-26 Thread Donald Smith
Oh, I should add that I was trying to use Cassandra 2.0.X on CentOS and it 
needed CentOS 6.2+.

Don

From: Arindam Barua [mailto:aba...@247-inc.com]
Sent: Wednesday, February 26, 2014 1:52 AM
To: user@cassandra.apache.org
Subject: RE: Supported Cassandra version for CentOS 5.5


I am running Cassandra 1.2.12 on CentOS 5.10.
Was running 1.1.15 previously without any issues as well.

-Arindam

From: Donald Smith [mailto:donald.sm...@audiencescience.com]
Sent: Tuesday, February 25, 2014 3:40 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Supported Cassandra version for CentOS 5.5


I was unable to get cassandra working with CentOS 5.X . I needed to use CentOS 
6.2 or 6.4.



Don


From: Hari Rajendhran hari.rajendh...@tcs.commailto:hari.rajendh...@tcs.com
Sent: Tuesday, February 25, 2014 2:34 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Supported Cassandra version for CentOS 5.5

Hi,

Currently i am using CentOS 5.5 OS.I need a clarification on the latest 
cassandra version(preferably 2.0.4) that my OS supports.



Best Regards
Hari Krishnan Rajendhran
Hadoop Admin
DESS-ABIM ,Chennai BIGDATA Galaxy
Tata Consultancy Services
Cell:- 9677985515
Mailto: hari.rajendh...@tcs.commailto:hari.rajendh...@tcs.com
Website: http://www.tcs.comhttp://www.tcs.com/

Experience certainty.IT Services
   Business Solutions
   Consulting


=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


RE: Supported Cassandra version for CentOS 5.5

2014-02-25 Thread Donald Smith
I was unable to get cassandra working with CentOS 5.X . I needed to use CentOS 
6.2 or 6.4.


Don


From: Hari Rajendhran hari.rajendh...@tcs.com
Sent: Tuesday, February 25, 2014 2:34 AM
To: user@cassandra.apache.org
Subject: Supported Cassandra version for CentOS 5.5

Hi,

Currently i am using CentOS 5.5 OS.I need a clarification on the latest 
cassandra version(preferably 2.0.4) that my OS supports.



Best Regards
Hari Krishnan Rajendhran
Hadoop Admin
DESS-ABIM ,Chennai BIGDATA Galaxy
Tata Consultancy Services
Cell:- 9677985515
Mailto: hari.rajendh...@tcs.com
Website: http://www.tcs.comhttp://www.tcs.com/

Experience certainty.IT Services
   Business Solutions
   Consulting


=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


Corrupted Index File exceptions in 2.0.5

2014-02-18 Thread Donald Smith
We're getting exceptions like the one below using cassandra 2.0.5.  A google 
search turns up nothing  about these except the source code.  Anyone have any 
insight?

ERROR [CompactionExecutor:188] 2014-02-12 04:15:53,232 CassandraDaemon.java 
(line 192) Exception in thread Thread[CompactionExecutor:188,1,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
Corrupted Index File   -jb-48064-CompressionInfo.db: read 20252 but 
expected 20253 chunks.

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

RE: Dangers of sudo swapoff --all

2014-02-13 Thread Donald Smith
I meant to say Doing sudo swapon -a on that node fixed the problem.

From: Donald Smith [mailto:donald.sm...@audiencescience.com]
Sent: Thursday, February 13, 2014 2:57 PM
To: 'user@cassandra.apache.org'
Subject: Dangers of sudo swapoff --all

I followed the recommendations at 
http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/install/installRecommendSettings.html
 and did:

$ sudo swapoff -all

on each of the cassandra servers in my test cluster.

I noticed, though, that sometimes the cassandra server and other processes on 
one of the nodes suddenly crashed, with no messages indicating why.

It turns out that on that node there wasn't much memory, and I was running 
other processes, so when the OS detected that there was insufficient memory for 
an operation, it unceremoniously killed some processes.  Doing sudo swapo -a 
fixed the problem.  This happened on both CentOs 6.2 and CentOS 6.4.

So, if you do sudo swapoff -all make sure you're not going to run out of 
memory!

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

Warning about copying and pasting from datastax configuration page: weird characters in config

2014-02-11 Thread Donald Smith
In 
http://www.datastax.com/documentation/cassandra/2.0/mobile/cassandra/install/installRecommendSettings.html
  it says:

Packaged installs: Ensure that the following settings are included in the 
/etc/security/limits.d/cassandra.conf file:
cassandra - memlock unlimited
cassandra - nofile 10
cassandra - nproc 32768
cassandra - as unlimited

But when I copy and paste those four lines to linux, it inserts periods  in the 
first two lines so it looks like this:
cassandra - memlock.unlimited
cassandra - nofile.10
cassandra - nproc 32768
cassandra - as unlimited

This happens for both firefox and chrome.  And it happened for my coworker too 
(though for him the spaces after “memlock” and “nofile” were deleted).If I 
paste to windows it doesn’t happen.

Using firebug, I found the HTML source:

pre 
class=precassandra‌·-‌·memlockensp;unlimited‌¶cassandra‌·-‌·nofileensp;10‌¶cassandra‌·-‌·nproc‌·32768‌¶cassandra‌·-‌·as‌·unlimited‌¶/pre

The HTML on that page 
http://www.datastax.com/documentation/cassandra/2.0/mobile/cassandra/install/installRecommendSettings.html
 seems fragile.



According to http://www.w3.org/TR/html4/sgml/entities.html:



!ENTITY enspCDATA #8194; -- en space, U+2002 ISOpub --



There were other specious characters included in some config I pasted from 
there, and that caused headaches. Specifically, earlier I saw:

cassandraâ- memlock unlimited
cassandraâ- nofile 10
cassandra - nproc 32768
cassandra - as unlimited

(with the weird â chars added).

Don


Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

RE: Warning about copying and pasting from datastax configuration page: weird characters in config

2014-02-11 Thread Donald Smith
The same problem happens with the non-mobile page: 
http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/install/installRecommendSettings.html

I used the mobile page because someone embedded that link in a wiki page I was 
referring to.  (I'll change the link.)

 Don

-Original Message-
From: Michael Shuler [mailto:mshu...@pbandjelly.org] On Behalf Of Michael Shuler
Sent: Tuesday, February 11, 2014 2:58 PM
To: user@cassandra.apache.org
Subject: Re: Warning about copying and pasting from datastax configuration 
page: weird characters in config

On 02/11/2014 04:50 PM, Donald Smith wrote:
 In
 http://www.datastax.com/documentation/cassandra/2.0/mobile/cassandra/install/installRecommendSettings.html
   it says:

Just curious.. why are you using the mobile site on a desktop, instead of the 
main page? [0]

--
Michael

[0]
http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/install/installRecommendSettings.html


RE: Question about local reads with multiple data centers

2014-01-30 Thread Donald Smith
I found the answer.

By default, the Datastax driver for Cassandra uses the RoundRobinPolicy for 
deciding which Cassandra node a client read or write request should be routed 
to. But that policy is independent of data center.

Per the documentation 
(http://www.datastax.com/drivers/java/2.0/apidocs/com/datastax/driver/core/policies/LoadBalancingPolicy.html)
 , one can see  that if you have multiple data centers, it's probably better to 
use DCAwareRoundRobinPolicy, which gives preference to the local data center. 
The client program needs to know which datacenter it resides in (e.g., DC1).


private void connect() {
if (m_session != null) {
return;
}
String[] components = m_cassandraNode.split(,);
Builder builder = Cluster.builder();  
for (String component : components) {
builder.addContactPoint(component);
}
long start = System.currentTimeMillis();
LoadBalancingPolicy loadBalancingPolicy = new 
DCAwareRoundRobinPolicy(localDataCenterName);
if (useTokenAwarePolicy) {loadBalancingPolicy= new 
TokenAwarePolicy(loadBalancingPolicy);}
m_cluster = builder.withLoadBalancingPolicy(loadBalancingPolicy)
.build();
m_session = m_cluster.connect();
prepareQueries();
float seconds = 0.001f * (System.currentTimeMillis() - start);
System.out.println(Connected to cassandra host  + 
m_cassandraNode
+  in  + seconds +  seconds.);
  }


-Original Message-
From: Duncan Sands [mailto:duncan.sa...@gmail.com] 
Sent: Thursday, January 30, 2014 1:19 AM
To: user@cassandra.apache.org
Subject: Re: Question about local reads with multiple data centers

Hi Donald, which driver are you using?  With the datastax python driver you 
need to use the DCAwareRoundRobinPolicy for the load balancing policy if you 
want the driver to distinguish between your data centres, otherwise by default 
it round robins robins requests amongst all nodes regardless of which data 
centre they are in, and regardless of which data centre the nodes you told it 
to connect to are in.  Probably it is the same for the other datastax drivers.

Best wishes, Duncan.

On 30/01/14 02:07, Donald Smith wrote:
 We have two datacenters, DC1 and DC2 in our test cluster. Our *write* 
 process uses a connection string with just the two hosts in DC1. Our *read* 
 process uses
 a connection string just with the two hosts in DC2.   We use a
 PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data 
 centers.

 I notice from the *read* process's logs that the reader adds ALL the 
 hosts (in both datacenters) to the list of queried hosts.

 My question: will the *read* process try to read first locally from the
 datacenter DC2 I specified in its connection string? I presume so.  (I 
 doubt
 that it uses the client's IP address to decide which datacenter is 
 closer. And I am unaware of another way to tell it to read locally.)

 Also, will read repair happen between datacenters automatically 
 (read_repair_chance=0.10)?  Or does that only happen within a 
 single data center?

 We're using Cassandra 2.0.4  and CQL.

 Thank you

 *Donald A. Smith*| Senior Software Engineer
 P: 425.201.3900 x 3866
 C: (206) 819-5965
 F: (646) 443-2333
 dona...@audiencescience.com mailto:dona...@audiencescience.com


 AudienceScience




Question about local reads with multiple data centers

2014-01-29 Thread Donald Smith
We have two datacenters, DC1 and DC2 in our test cluster. Our write process 
uses a connection string with just the two hosts in DC1. Our read process uses 
a connection string just with the two hosts in DC2.   We use a 
PropertyFileSnitch and a property file that 'DC1':2, 'DC2':1 between data 
centers.

I notice from the read process's logs that the reader adds ALL the hosts (in 
both datacenters) to the list of queried hosts.

My question: will the read process try to read first locally from the 
datacenter DC2 I specified in its connection string? I presume so.  (I 
doubt that it uses the client's IP address to decide which datacenter is 
closer. And I am unaware of another way to tell it to read locally.)

Also, will read repair happen between datacenters automatically 
(read_repair_chance=0.10)?  Or does that only happen within a single data 
center?

We're using Cassandra 2.0.4  and CQL.

Thank you

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg

RE: No deletes - is periodic repair needed? I think not...

2014-01-27 Thread Donald Smith
Last week I made a feature request to apache cassandra along these lines:  
https://issues.apache.org/jira/browse/CASSANDRA-6611

Don

From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Monday, January 27, 2014 4:05 PM
To: user@cassandra.apache.org
Subject: Re: No deletes - is periodic repair needed? I think not...

If you have only ttl columns, and you never update the column I would not think 
you need a repair.

Repair cures lost deletes. If all your writes have a ttl a lost write should 
not matter since the column was never written to the node and thus could never 
be resurected on said node.

Unless i am missing something.

On Monday, January 27, 2014, Laing, Michael 
michael.la...@nytimes.commailto:michael.la...@nytimes.com wrote:
 Thanks Sylvain,
 Your assumption is correct!
 So I think I actually have 4 classes:
 1.Regular values, no deletes, no overwrites, write heavy, variable ttl's 
 to manage size
 2.Regular values, no deletes, some overwrites, read heavy (10 to 1), 
 fixed ttl's to manage size
 2.a. Regular values, no deletes, some overwrites, read heavy (10 to 1), 
 variable ttl's to manage size
 3.Counter values, no deletes, update heavy, rotation/truncation to manage 
 size
 Only 2.a. above requires me to do 'periodic repair'.
 What I will actually do is change my schema and applications slightly to 
 eliminate the need for overwrites on the only table I have in that category.
 And I will set gc_grace_period to 0 for the tables in the updated schema and 
 drop 'periodic repair' from the schedule.
 Cheers,
 Michael


 On Mon, Jan 27, 2014 at 4:22 AM, Sylvain Lebresne 
 sylv...@datastax.commailto:sylv...@datastax.com wrote:

 By periodic repair, I'll assume you mean having to run repair every 
 gc_grace period to make sure no deleted entries resurrect. With that 
 assumption:


 1. Regular values, no deletes, no overwrites, write heavy, ttl's to manage 
 size

 Since 'repair within gc_grace' is about avoiding value that have been 
 deleted to resurrect, if you do no delete nor overwrites, you're in no risk 
 of that (and don't need to 'repair withing gc_grace').


 2. Regular values, no deletes, some overwrites, read heavy (10 to 1), ttl's 
 to manage size

 It depends a bit. In general, if you always set the exact same TTL on every 
 insert (implying you always set a TTL), then you have nothing to worry 
 about. If the TTL varies (of if you only set TTL some of the times), then 
 you might still need to have some periodic repairs. That being said, if 
 there is no deletes but only TTLs, then the TTL kind of lengthen the period 
 at which you need to do repair: instead of needing to repair withing 
 gc_grace, you only need to repair every gc_grace + min(TTL) (where min(TTL) 
 is the smallest TTL you set on columns).

 3. Counter values, no deletes, update heavy, rotation/truncation to manage 
 size

 No deletes and no TTL implies that your fine (as in, there is no need for 
 'repair withing gc_grace').

 --
 Sylvain


--
Sorry this was sent from mobile. Will do less grammar and spell check than 
usual.


Benchanmarks of cassandra replication across data centers?

2014-01-24 Thread Donald Smith
Does anyone know of any good benchmark data about cassandra replication across 
data centers?  I'm aware of the articles below.

This article 
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 
from netflix is about benchmarking Cassandra scalability using AWS. It shows 
linear scaling up to 288 nodes.  I don't see much data about the cost of 
cross-data-center replication.

This compares AWS, Rackspace and Google cloud performance for Cassandra: 
http://www.stackdriver.com/cassandra-aws-gce-rackspace/. AWS does well.

These Netflix slides say they run a nightly repair job to make sure everything 
stays consistent:  
http://www.slideshare.net/adrianco/cassandra-performance-on-aws
They also talk about backing up Cassandra.   Data size reached only 5GB per 
node (tiny!).  Slide 35 says there's a 100+ mls latency between US and EU 
datacenters.
Slide 36 shows how to add a new datacenter with no down time (pre-load from a 
back-up), then do repair jobs.

http://www.odbms.org/blog/2011/05/measuring-the-scalability-of-sql-and-nosql-systems/
  compares cassandra with three other systems. Cassandra performed well.

They open-sourced the benchmark code: it is available 
herehttps://github.com/brianfrankcooper/YCSB.  Complex.
  Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]

inline: image001.jpg