date:20140516

 What does the rate signify in this context?  For example, given the 
 OneMinuteRate of  675.7673129014964 and the unit of seconds--what is this 
 measuring?

means that there were 675 write requests per second over the last one minute. 

As Other Chris (tm) mentioned this is exp decaying reservoir.

 Uses an exponentially decaying sample of 1028 elements, which offers a 
99.9% confidence
 level with a 5% margin of error assuming a normal distribution, and an 
alpha factor of 0.015
 which heavily biases the sample to the past 5 minutes of measurements.

http://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf

---
Chris Lohfink

On May 7, 2014, at 1:00 PM, Chris Burroughs chris.burrou...@gmail.com wrote:

 They are exponential decaying moving averages (like Unix load averages) of 
 the number of events per unit of time.
 
 http://wiki.apache.org/cassandra/Metrics might help
 
 On 04/17/2014 06:06 PM, Redmumba wrote:
 Good afternoon,
 
 I'm attempting to integrate the metrics generated via JMX into our internal
 framework; however, the information for several of the metrics includes a
 One/Five/Fifteen-minute rate, with the RateUnit in SECONDS.  For
 example:
 
 $get -b
 org.apache.cassandra.metrics:name=Latency,scope=Write,type=ClientRequest *
 #mbean =
 org.apache.cassandra.metrics:name=Latency,scope=Write,type=ClientRequest:
 LatencyUnit = MICROSECONDS;
 
 EventType = calls;
 
 RateUnit = SECONDS;
 
 MeanRate = 383.6944837362387;
 
 FifteenMinuteRate = 868.8420188648543;
 
 FiveMinuteRate = 817.5239450236011;
 
 OneMinuteRate = 675.7673129014964;
 
 Max = 498867.0;
 
 Count = 31257426;
 
 Min = 52.0;
 
 50thPercentile = 926.0;
 
 Mean = 1063.114029159023;
 
 StdDev = 1638.1542477604232;
 
 75thPercentile = 1064.75;
 
 95thPercentile = 1304.55;
 
 98thPercentile = 1504.39992;
 
 99thPercentile = 2307.35104;
 
 999thPercentile = 10491.8502;
 
 
 What does the rate signify in this context?  For example, given the
 OneMinuteRate of  675.7673129014964 and the unit of seconds--what is this
 measuring?  Is this the rate of which metrics are submitted? i.e., there
 were an average of (676 * 60 seconds) metrics submitted over the last
 minute?
 
 Thanks!

Re: clearing tombstones?

2014-05-16 Thread Ruchir Jha

I tried to do this, however the doubling in disk space is not temporary
as you state in your note. What am I missing?


On Fri, Apr 11, 2014 at 10:44 AM, William Oberman
ober...@civicscience.comwrote:

 So, if I was impatient and just wanted to make this happen now, I could:

 1.) Change GCGraceSeconds of the CF to 0
 2.) run nodetool compact (*)
 3.) Change GCGraceSeconds of the CF back to 10 days

 Since I have ~900M tombstones, even if I miss a few due to impatience, I
 don't care *that* much as I could re-run my clean up tool against the now
 much smaller CF.

 (*) A long long time ago I seem to recall reading advice about don't ever
 run nodetool compact, but I can't remember why.  Is there any bad long
 term consequence?  Short term there are several:
 -a heavy operation
 -temporary 2x disk space
 -one big SSTable afterwards
 But moving forward, everything is ok right?  CommitLog/MemTable-SStables,
 minor compactions that merge SSTables, etc...  The only flaw I can think of
 is it will take forever until the SSTable minor compactions build up enough
 to consider including the big SSTable in a compaction, making it likely
 I'll have to self manage compactions.



 On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy mark.re...@boxever.comwrote:

 Correct, a tombstone will only be removed after gc_grace period has
 elapsed. The default value is set to 10 days which allows a great deal of
 time for consistency to be achieved prior to deletion. If you are
 operationally confident that you can achieve consistency via anti-entropy
 repairs within a shorter period you can always reduce that 10 day interval.


 Mark


 On Fri, Apr 11, 2014 at 3:16 PM, William Oberman 
 ober...@civicscience.com wrote:

 I'm seeing a lot of articles about a dependency between removing
 tombstones and GCGraceSeconds, which might be my problem (I just checked,
 and this CF has GCGraceSeconds of 10 days).


 On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli 
 tbarbu...@gmail.comwrote:

 compaction should take care of it; for me it never worked so I run
 nodetool compaction on every node; that does it.


 2014-04-11 16:05 GMT+02:00 William Oberman ober...@civicscience.com:

 I'm wondering what will clear tombstoned rows?  nodetool cleanup,
 nodetool repair, or time (as in just wait)?

 I had a CF that was more or less storing session information.  After
 some time, we decided that one piece of this information was pointless to
 track (and was 90%+ of the columns, and in 99% of those cases was ALL
 columns for a row).   I wrote a process to remove all of those columns
 (which again in a vast majority of cases had the effect of removing the
 whole row).

 This CF had ~1 billion rows, so I expect to be left with ~100m rows.
  After I did this mass delete, everything was the same size on disk (which
 I expected, knowing how tombstoning works).  It wasn't 100% clear to me
 what to poke to cause compactions to clear the tombstones.  First I tried
 nodetool cleanup on a candidate node.  But, afterwards the disk usage was
 the same.  Then I tried nodetool repair on that same node.  But again, 
 disk
 usage is still the same.  The CF has no snapshots.

 So, am I misunderstanding something?  Is there another operation to
 try?  Do I have to just wait?  I've only done cleanup/repair on one 
 node.
  Do I have to run one or the other over all nodes to clear tombstones?

 Cassandra 1.2.15 if it matters,

 Thanks!

 will

Multi-dc cassandra keyspace

2014-05-16 Thread Anand Somani

Hi,

It seems like it should be possible to have a keyspace replicated only to a
subset of DC's on a given cluster spanning across multiple DCs? Is there
anything bad about this approach?

Scenario
Cluster spanning 4 DC's = CA, TX, NY, UT
Has multiple keyspaces such that
* keyspace_CA_TX - replication_strategy = {CA = 3, TX = 3}
* keyspace_UT_NY - replication_strategy = {UT = 3, NY = 3}
* keyspace_CA_UT - replication_strategy = {UT = 3, CA = 3}

I am going to try this out, but was curious if anybody out there has tried
it.

Thanks
Anand

Erase old sstables to make room for new sstables

2014-05-16 Thread Redmumba

In the system we're using, we have a large fleet of servers constantly
appending time-based data to our database--it's largely writes, very few
reads (it's auditing data).  However, our cluster max space is around 80TB,
and we'd like to maximize how much data we can retain.

One option is to delete all old records, or to set a TTL, but that requires
a substantial clean-up process that we could easily avoid if we were able
to just flat-out drop the oldest sstables.  I.e., when we get to 90% disk
space, drop the oldest sstable.  Obviously, the oldest sstable on one may
not be the same as the oldest sstable on another, but since this is the
oldest data, that is an acceptable inconsistency.

Is this possible to do safely?  The data in the oldest sstable is always
guaranteed to be the oldest data, so that is not my concern--my main
concern is whether or not we can even do this, and also how we can notify
Cassandra that an sstable has been removed underneath it.

tl;dr: Can I routinely remove the oldest sstable to free up disk space,
without causing stability drops in Cassandra?

Thanks for your feedback!

Andrew

Tombstones

2014-05-16 Thread Dimetrio

Does cassandra delete tombstones during simple LCS compaction or I should use
node tool repair?

Thanks.



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Disable reads during node rebuild

That'll be really useful, thanks!!


On Wed, May 14, 2014 at 7:47 PM, Aaron Morton aa...@thelastpickle.comwrote:

 As of 2.0.7, driftx has added this long-requested feature.

 Thanks

 A
 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 13/05/2014, at 9:36 am, Robert Coli rc...@eventbrite.com wrote:

 On Mon, May 12, 2014 at 10:18 AM, Paulo Ricardo Motta Gomes 
 paulo.mo...@chaordicsystems.com wrote:

 Is there a way to disable reads from a node while performing rebuild from
 another datacenter? I tried starting the node in write survery mode, but
 the nodetool rebuild command does not work in this mode.


 As of 2.0.7, driftx has added this long-requested feature.

 https://issues.apache.org/jira/browse/CASSANDRA-6961

 Note that it is impossible to completely close the race window here as
 long as writes are incoming, this functionality just dramatically shortens
 it.

 =Rob






-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200

Re: Efficient bulk range deletions without compactions by dropping SSTables.

2014-05-16 Thread graham sanderson

Just a few data points from our experience

One of our use cases involves storing a periodic full base state for millions 
of records, then fairly frequent delta updates to subsets of the records in 
between. C* is great for this because we can read the whole row (or up to the 
clustering key/column marking “now” as perceived by the client) and munge the 
base + deltas together in the client.

To keep rows small (and for recovery), we start over in a new CF whenever we 
start a new base state

The upshot is that we have pretty much the same scenario as Jeremy is describing

For this use case we are also using Astyanax (but C* 2.0.5)

We have not come across many of the schema problems you mention (which is 
likely accountable to some changes in the 2.0.x line), however one thing to 
note is that Astyanax itself seems to be very picky about un-resolved schema 
changes. We found that we had to do the schema changes via a CQL “create table” 
(we can still use Astyanax for that) rather than creating it via old style 
thrift CF creation


On May 13, 2014, at 9:42 AM, Jeremy Powell jeremym.pow...@gmail.com wrote:

 Hi Kevin,
 
 C* version: 1.2.xx
 Astyanax: 1.56.xx
 
 We basically do this same thing in one of our production clusters, but rather 
 than dropping SSTables, we drop Column Families. We time-bucket our CFs, and 
 when a CF has passed some time threshold (metadata or embedded in CF name), 
 it is dropped. This means there is a home-grown system that is doing the 
 bookkeeping/maintenance rather than relying on C*s inner workings. It is 
 unfortunate that we have to maintain a system which maintains CFs, but we've 
 been in a pretty good state for the last 12 months using this method. 
 
 Some caveats:
 
 By default, C* makes snapshots of your data when a table is dropped. You can 
 leave that and have something else clear up the snapshots, or if you're less 
 paranoid, set auto_snapshot: false in the cassandra.yaml file.
 
 Cassandra does not handle 'quick' schema changes very well, and we found that 
 only one node should be used for these changes. When adding or removing 
 column families, we have a single, property defined C* node that is 
 designated as the schema node. After making a schema change, we had to throw 
 in an artificial delay to ensure that the schema change propagated through 
 the cluster before making the next schema change. And of course, relying on a 
 single node being up for schema changes is less than ideal, so handling fail 
 over to a new node is important.
 
 The final, and hardest problem, is that C* can't really handle schema changes 
 while a node is being bootstrapped (new nodes, replacing a dead node). If a 
 column family is dropped, but the new node has not yet received that data 
 from its replica, the node will fail to bootstrap when it finally begins to 
 receive that data - there is no column family for the data to be written to, 
 so that node will be stuck in the joining state, and it's system keyspace 
 needs to be wiped and re-synced to attempt to get back to a happy state. This 
 unfortunately means we have to stop schema changes when a node needs to be 
 replaced, but we have this flow down pretty well.
 
 Hope this helps,
 Jeremy Powell
 
 
 On Mon, May 12, 2014 at 5:53 PM, Kevin Burton bur...@spinn3r.com wrote:
 We have a log only data structure… everything is appended and nothing is ever 
 updated.
 
 We should be totally fine with having lots of SSTables sitting on disk 
 because even if we did a major compaction the data would still look the same.
 
 By 'lots' I mean maybe 1000 max.  Maybe 1GB each.
 
 However, I would like a way to delete older data.
 
 One way to solve this could be to just drop an entire SSTable if all the 
 records inside have tombstones.
 
 Is this possible, to just drop a specific SSTable?  
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.
 
 



smime.p7s
Description: S/MIME cryptographic signature

Re: Mutation messages dropped

It means asynchronous write mutations were dropped, but if the writes are
completing without TimedOutException, then at least ConsistencyLevel
replicas were correctly written. The remaining replicas will eventually be
fixed by hinted handoff, anti-entropy (repair) or read repair.

More info: http://wiki.apache.org/cassandra/FAQ#dropped_messages

Please note that 1 mutation != 1 record. For instance, if 1 row has N
columns, than a record write for that row will have N mutations AFAIK
(please correct me if I'm wrong).

On Fri, May 9, 2014 at 8:52 AM, Raveendran, Varsha IN BLR STS 
varsha.raveend...@siemens.com wrote:

  Hello,

 I am writing around 10Million records continuously into a single node
 Cassandra (2.0.5) .
 In the Cassandra log file I see an entry “*272 MUTATION messages dropped
 in last 5000ms*” . Does this mean that 272 records were not written
 successfully?

 Thanks,
 Varsha





-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Peter Lin

perhaps the committers should invite other developers that have shown an
interest in contributing to Cassandra.

the rate of adding new non-Datastax committers appears to be low the last
2 years. I have no data to support it, it's just a feeling based personal
observations the last 3 years.

Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-16 Thread Jonathan Lacefield

Hello,

 Have you looked at using the CLUSTERING ORDER BY and LIMIT features of
CQL3?

 These may help you achieve your goals.


http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html

http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487
http://www.linkedin.com/in/jlacefield

http://www.datastax.com/cassandrasummit14



On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.com wrote:

 Hi, I'm modeling some queries in CQL3.

 I'd like to query first 1 columns for each partitioning keys in CQL3.

 For example:

 create table posts(
 author ascii,
 created_at timeuuid,
 entry text,
 primary key(author,created_at)
 );
 insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 john');
 insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john');
 insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 mike');
 insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike');


 And I want results like below.

 mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike
 john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john


 I think that this is what SELECT FIRST  statements did in CQL2.

 The only way I came across in CQL3 is retrieve whole records and drop
 manually,
 but it's obviously not efficient.

 Could you please tell me more straightforward way in CQL3?

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Jeremy Hanna

Of the 16 active committers, 8 are not at DataStax.  See 
http://wiki.apache.org/cassandra/Committers.  That said, active involvement 
varies and there are other contributors inside DataStax and in the community.  
You can look at the dev mailing list as well to look for involvement in more 
detail.

On 16 May 2014, at 10:28, Janne Jalkanen janne.jalka...@ecyrd.com wrote:

 
 Don’t know, but as a potential customer of DataStax I’m also concerned at the 
 fact that there does not seem to be a competitor offering Cassandra support 
 and services. All innovation seems to be occurring only in the OSS version or 
 DSE(*).  I’d welcome a competitor for DSE - it does not even have to be so 
 well-rounded ;-)
 
 (DSE is really cool, and I think DataStax is doing awesome work. I just get 
 uncomfortable when there’s a SPoF - that’s why I’m running Cassandra in the 
 first place ;-)
 
 ((So yes, you, exactly you who is reading this and thinking of starting a 
 company around Cassandra, pitch me when you have a product.))
 
 (((* Yes, Netflix is open sourcing a lot of Cassandra stuff, but I don’t 
 think they’re planning to pivot.)))
 
 /Janne
 
 On 14 May 2014, at 23:39, Kevin Burton bur...@spinn3r.com wrote:
 
 I'm curious what % of cassandra developers are employed by Datastax?
 
 … vs other companies.
 
 When MySQL was acquired by Oracle this became a big issue because even 
 though you can't really buy an Open Source project, you can acquire all the 
 developers and essentially do the same thing.
 
 It would be sad if all of Cassandra's 'eggs' were in one basket and a 
 similar situation happens with Datastax.
 
 Seems like they're doing an awesome job to be sure but I guess it worries me 
 in the back of my mind.
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.

RE: NTS, vnodes and 0% chance of data loss

2014-05-16 Thread Mark Farnan

Why not use NetworkTopology and specify each region as a ‘DC’ ?

 

Setup a snitch (propertyFile or Gossip, or even the EC2Region one) to list out 
which nodes are in which DC. 

 

Then when creating the Keyspace, specify NetworkTopology,  with RF1 in each DC 
/ Rack.

 

Ie.

CREATE KEYSPACE fred WITH replication = {'class': 'NetworkTopologyStrategy', 
'DC2': '1', 'DC3': '1', 'DC1': '1'};

 

Regards

 

 

Mark Farnan



 

 

From: William Oberman [mailto:ober...@civicscience.com] 
Sent: Tuesday, May 13, 2014 11:11 PM
To: user@cassandra.apache.org
Subject: NTS, vnodes and 0% chance of data loss

 

I found this:

http://mail-archives.apache.org/mod_mbox/cassandra-user/201404.mbox/%3ccaeduwd1erq-1m-kfj6ubzsbeser8dwh+g-kgdpstnbgqsqc...@mail.gmail.com%3E

 

I read the three referenced cases.  In addition, case 4123 references:

http://www.mail-archive.com/dev@cassandra.apache.org/msg03844.html

 

And even though I *think* I understand all of the issues now, I still want to 
double check...

 

Assumptions:

-A cluster using NTS with options [DC:3]

-Physical layout = In DC, 3 nodes/rack for a total of 9 nodes

 

No vnodes: I could do token selection using ideas from case 3810 such that each 
rack has one replica.  At this point, my 0% chance of data loss scenarios are:

1.) Failure of two nodes at random

2.) Failure of 2 racks (6 nodes!)

 

Vnodes: my 0% chance of data loss scenarios are:

1.) Failure of two nodes at random

Which means a rack failure (3 nodes) has a non-zero chance of data failure 
(right?).

 

To get specific, I'm in AWS, so racks ~= availability zones.  In the years 
I've been in AWS, I've seen several occasions of single zone downtimes, and 
one time of single zone catastrophic loss.  E.g. for AWS I feel like you 
*have* to plan for a single zone failure, and in terms of safety first you 
*should* plan for two zone failures.

 

To mitigate this data loss risk seems rough for vnodes, again if I'm 
understanding everything correctly:

-To ensure 0% data loss for one zone = I need RF=4

-To ensure 0% data loss for two zones = I need RF=7

 

I'd really like to use vnodes, but RF=7 is crazy.

 

To reiterate what I think is the core idea of this message: 

1.) for vnodes 0% data loss = RF=(# of allowed failures at once)+1

2.) racks don't change the above equation at all

 

will

ANN Cassaforte 1.3.0 is released

2014-05-16 Thread Michael Klishin

Cassaforte [1] is a Clojure client for Cassandra built around CQL
and focusing on ease of use.

Release notes:
http://blog.clojurewerkz.org/blog/2014/05/15/cassaforte-1-dot-3-0-is-released/

1. http://clojurecassandra.info
-- 
MK

http://github.com/michaelklishin
http://twitter.com/michaelklishin

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Michael Shuler


On 05/14/2014 03:39 PM, Kevin Burton wrote:

I'm curious what % of cassandra developers are employed by Datastax?


http://wiki.apache.org/cassandra/Committers

--
Kind regards,
Michael

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Janne Jalkanen


Don’t know, but as a potential customer of DataStax I’m also concerned at the 
fact that there does not seem to be a competitor offering Cassandra support and 
services. All innovation seems to be occurring only in the OSS version or 
DSE(*).  I’d welcome a competitor for DSE - it does not even have to be so 
well-rounded ;-)

(DSE is really cool, and I think DataStax is doing awesome work. I just get 
uncomfortable when there’s a SPoF - that’s why I’m running Cassandra in the 
first place ;-)

((So yes, you, exactly you who is reading this and thinking of starting a 
company around Cassandra, pitch me when you have a product.))

(((* Yes, Netflix is open sourcing a lot of Cassandra stuff, but I don’t think 
they’re planning to pivot.)))

/Janne

On 14 May 2014, at 23:39, Kevin Burton bur...@spinn3r.com wrote:

 I'm curious what % of cassandra developers are employed by Datastax?
 
 … vs other companies.
 
 When MySQL was acquired by Oracle this became a big issue because even though 
 you can't really buy an Open Source project, you can acquire all the 
 developers and essentially do the same thing.
 
 It would be sad if all of Cassandra's 'eggs' were in one basket and a similar 
 situation happens with Datastax.
 
 Seems like they're doing an awesome job to be sure but I guess it worries me 
 in the back of my mind.
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.

Re: Mutation messages dropped

Shameless plug: 
http://www.evidencebasedit.com/guide-to-cassandra-thread-pools/#droppable

On May 15, 2014, at 7:37 PM, Mark Reddy mark.re...@boxever.com wrote:

 Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for 
 further details.
 
 
 Mark
 
 
 On Fri, May 9, 2014 at 12:52 PM, Raveendran, Varsha IN BLR STS 
 varsha.raveend...@siemens.com wrote:
 Hello,
  
 I am writing around 10Million records continuously into a single node 
 Cassandra (2.0.5) .
 In the Cassandra log file I see an entry “272 MUTATION messages dropped in 
 last 5000ms” . Does this mean that 272 records were not written successfully?
  
 Thanks,
 Varsha

Migrate a model from 0.6

2014-05-16 Thread cbert...@libero.it

Hi all,
more than a years ago I wrote a comment for migrating an old schema to a new 
model.
Since the company had other priorities we didn't realize, and now I'm trying 
to upgrade 
my 0.6 data-model to the newest 2.0 model.

The DB contains mainly comments written by users on companies.
Comments must be validated (when they come into the application they are in 
pending status,
and then they can be approved or rejected).

The main queries with very intensive use (and that should perform very fast) 
are:

1) Get all approved comments of a company sorted by insertion time
2) Get all approved comments of a user sorted by insertion time
3) Get latest X approved comments in city with a vote higher than Y sorted by 
insertion time 

User/Company comments are less than 100 in 90% of situations: in general when 
dealing with
user and company comments the amount of data is few kilobytes.
Comments in a city can be a more than 200.000 and is a fast-growing number.

In my old data model I had companies table, users table and comments table. 
The last containing the comments and 3 more
column families (company_comments/user_comments/city_comments) containing only 
a set of time-sorted uuid pointers to comments table. 

I have no idea in how many tables I should keep data in new model. I've been 
reading lots of
documentation: to make the model easier I though something like this ...

users and companies table like in the old model. As far as comments:

CREATE TABLE comments (
  location text,
  id timeuuid,
  status text,
  companyid uuid,
  userid uuid,
  text text,
  title text,
  vote varint,
  PRIMARY KEY ((location, status, vote), id)
) WITH CLUSTERING ORDER BY (id DESC);

create index companyid_key on commenti(companyid);
create index userid_key on commenti(userid);

This model should provide, out of the box, the query number 3. 

select * from comments where location='city' and status='approved' and vote in 
(3,4,5) order by id DESC limit X;

But the other 2 queries are made with secondary index and client-side 
intensive.

select * from comments where companyid='123';
select * from comments where userid='123';

And this will retrieve all company/user comments but they are

1 - not filtered by their status
2 - not sorted in any way

Considering the amount of data told before how would you model the platform?

Thanks for any help

Re: Cassandra MapReduce/Storm/ etc

2014-05-16 Thread Jack Krupansky

Here’s a meetup talk on analytics using Cassandra, Storm, and Kafka:
http://www.slideshare.net/aih1013/building-largescale-analytics-platform-with-storm-kafka-and-cassandra-nyc-storm-user-group-meetup-21st-nov-2013

-- Jack Krupansky

From: Manoj Khangaonkar 
Sent: Thursday, May 8, 2014 5:43 PM
To: user@cassandra.apache.org 
Subject: Cassandra  MapReduce/Storm/ etc

Hi,

Searching for Cassandra with MapReduce, I am finding that the search results 
are really dated -- from version 0.7  2010/2011.

Is there a good blog/article that describes how using MapReduce on Cassandra 
table ?

From my naive understanding, Cassandra is all about partitioning. Querying is 
based on partitionkey + clustered column(s).

Inputs to MapReduce is a sequence of Key,values. For Storm it is a stream of 
tuples.

If a database table is input source for MapReduce or Storm, for me , this is in 
the simple case, is translating to a full table scan of the input table, which 
can timeout and is generally not a recommended access pattern in Cassandra. 

My initial reaction is that if I need to process data with MapReduce or Storm, 
reading it from Cassandra might not be the optimal way. Storing the output to 
Cassandra however does make sense.

If anyone had links to blogs or personal experience in this area, I would 
appreciate if you can share it.

regards

conditional delete consistency level/timeout

2014-05-16 Thread Mohica Jasha

Earlier I reported the following bug against C* 2.0.5
https://issues.apache.org/jira/browse/CASSANDRA-7176
It seems to be fixed in C* 2.0.7, but we are still seeing similar
suspicious timeouts.

We have a cluster of C* 2.0.7, DC1:3, DC2:3

We have the following table:
CREATE TABLE conditional_update_lock (
  resource_id text,
  lock_id uuid,
  PRIMARY KEY (resource_id)
)

We noticed that DELETE queries against this table sometimes timeout:

A sample raw query executed through datastax java-driver 2.0.1 which timed
out:
DELETE from conditional_update_lock where resource_id =
'STUDY_4234234.324.470' IF lock_id = da2dd547-e807-45de-9d8c-787511123f3c;

java-driver throws
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout
during rea query at consistency LOCAL_QUORUM (2 responses were required but
only 1 replica responded)

We set LOCAL_SERIAL and LOCAL_QUORUM as serial consistency level and
consistency level in the query option passed to datastax Cluster.Builder.
In my understanding the above query should be executed in LOCAL_SERIAL
consistency level, I wonder why the exception says it failed to run the
query in the LOCAL_QUORUM consistency level?

We are running a large number of queries against different tables in our
cassandra cluster but only the above one times out often. I wonder if there
is anything inefficient/buggy in the implementation of conditional delete
in cassandra?

Mohica

Re: Cassandra 2.0.7 always failes due to 'too may open files' error

2014-05-16 Thread Yatong Zhang

Yes the global limits are OK. I added cassandra to '/etc/rc.local' to make
it auto-startup, but seems the modification of limits didn't take effect. I
observed this as Bryan suggested, so I added

ulimit -SHn 99


to '/etc/rc.local' and before cassandra start command, and it worked.


On Thu, May 8, 2014 at 3:34 AM, Nikolay Mihaylov n...@nmmm.nu wrote:

 sorry, probably somebody mentioned it, but did you checked global limit?

 cat /proc/sys/fs/file-max
 cat /proc/sys/fs/file-nr



 On Mon, May 5, 2014 at 10:31 PM, Bryan Talbot 
 bryan.tal...@playnext.comwrote:

 Running

 # cat /proc/$(cat /var/run/cassandra.pid)/limits

 as root or your cassandra user will tell you what limits it's actually
 running with.




 On Sun, May 4, 2014 at 10:12 PM, Yatong Zhang bluefl...@gmail.comwrote:

 I am running 'repair' when the error occurred. And just a few days
 before I changed the compaction strategy to 'leveled'. don know if this
 helps


 On Mon, May 5, 2014 at 1:10 PM, Yatong Zhang bluefl...@gmail.comwrote:

 Cassandra is running as root

 [root@storage5 ~]# ps aux | grep java
 root  1893 42.0 24.0 7630664 3904000 ? Sl   10:43  60:01 java
 -ea -javaagent:/mydb/cassandra/bin/../lib/jamm-0.2.5.jar
 -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities
 -XX:ThreadPriorityPolicy=42 -Xms3959M -Xmx3959M -Xmn400M
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103
 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
 -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:+UseTLAB -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true
 -Dcom.sun.management.jmxremote.port=7199
 -Dcom.sun.management.jmxremote.ssl=false
 -Dcom.sun.management.jmxremote.authenticate=false
 -Dlog4j.configuration=log4j-server.properties
 -Dlog4j.defaultInitOverride=true 
 -Dcassandra-pidfile=/var/run/cassandra.pid
 -cp
 /mydb/cassandra/bin/../conf:/mydb/cassandra/bin/../build/classes/main:/mydb/cassandra/bin/../build/classes/thrift:/mydb/cassandra/bin/../lib/antlr-3.2.jar:/mydb/cassandra/bin/../lib/apache-cassandra-2.0.7.jar:/mydb/cassandra/bin/../lib/apache-cassandra-clientutil-2.0.7.jar:/mydb/cassandra/bin/../lib/apache-cassandra-thrift-2.0.7.jar:/mydb/cassandra/bin/../lib/commons-cli-1.1.jar:/mydb/cassandra/bin/../lib/commons-codec-1.2.jar:/mydb/cassandra/bin/../lib/commons-lang3-3.1.jar:/mydb/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/mydb/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:/mydb/cassandra/bin/../lib/disruptor-3.0.1.jar:/mydb/cassandra/bin/../lib/guava-15.0.jar:/mydb/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/mydb/cassandra/bin/../lib/jackson-core-asl-1.9.2.jar:/mydb/cassandra/bin/../lib/jackson-mapper-asl-1.9.2.jar:/mydb/cassandra/bin/../lib/jamm-0.2.5.jar:/mydb/cassandra/bin/../lib/jbcrypt-0.3m.jar:/mydb/cassandra/bin/../lib/jline-1.0.jar:/mydb/cassandra/bin/../lib/json-simple-1.1.jar:/mydb/cassandra/bin/../lib/libthrift-0.9.1.jar:/mydb/cassandra/bin/../lib/log4j-1.2.16.jar:/mydb/cassandra/bin/../lib/lz4-1.2.0.jar:/mydb/cassandra/bin/../lib/metrics-core-2.2.0.jar:/mydb/cassandra/bin/../lib/netty-3.6.6.Final.jar:/mydb/cassandra/bin/../lib/reporter-config-2.1.0.jar:/mydb/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/mydb/cassandra/bin/../lib/slf4j-api-1.7.2.jar:/mydb/cassandra/bin/../lib/slf4j-log4j12-1.7.2.jar:/mydb/cassandra/bin/../lib/snakeyaml-1.11.jar:/mydb/cassandra/bin/../lib/snappy-java-1.0.5.jar:/mydb/cassandra/bin/../lib/snaptree-0.1.jar:/mydb/cassandra/bin/../lib/super-csv-2.1.0.jar:/mydb/cassandra/bin/../lib/thrift-server-0.3.3.jar
 org.apache.cassandra.service.CassandraDaemon




 On Mon, May 5, 2014 at 1:02 PM, Philip Persad 
 philip.per...@gmail.comwrote:

 Have you tried running ulimit -a as the Cassandra user instead of
 as root? It is possible that your configured a high file limit for root 
 but
 not for the user running the Cassandra process.


 On Sun, May 4, 2014 at 6:07 PM, Yatong Zhang bluefl...@gmail.comwrote:

 [root@storage5 ~]# lsof -n | grep java | wc -l
 5103
 [root@storage5 ~]# lsof | wc -l
 6567


 It's mentioned in previous mail:)


 On Mon, May 5, 2014 at 9:03 AM, nash nas...@gmail.com wrote:

 The lsof command or /proc can tell you how many open files it has.
 How many is it?

 --nash

Re: What does the rate signify for latency in the JMX Metrics?

2014-05-16 Thread Redmumba

Unfortunately, I found the documentation to be very lackluster.  However, I
have actually begun to use the Yammer Metrics library in other projects, so
I have a much better understanding of what it generates.  Thank you for the
response!

(also, for some strange reason, I am just getting the email now, on 5/16,
even though it says you replied 5/7--weird)

Andrew


On Wed, May 7, 2014 at 11:00 AM, Chris Burroughs
chris.burrou...@gmail.comwrote:

 They are exponential decaying moving averages (like Unix load averages) of
 the number of events per unit of time.

 http://wiki.apache.org/cassandra/Metrics might help


 On 04/17/2014 06:06 PM, Redmumba wrote:

 Good afternoon,

 I'm attempting to integrate the metrics generated via JMX into our
 internal
 framework; however, the information for several of the metrics includes a
 One/Five/Fifteen-minute rate, with the RateUnit in SECONDS.  For
 example:

 $get -b

 org.apache.cassandra.metrics:name=Latency,scope=Write,type=ClientRequest
 *
 #mbean =
 org.apache.cassandra.metrics:name=Latency,scope=Write,type=
 ClientRequest:
 LatencyUnit = MICROSECONDS;

 EventType = calls;

 RateUnit = SECONDS;

 MeanRate = 383.6944837362387;

 FifteenMinuteRate = 868.8420188648543;

 FiveMinuteRate = 817.5239450236011;

 OneMinuteRate = 675.7673129014964;

 Max = 498867.0;

 Count = 31257426;

 Min = 52.0;

 50thPercentile = 926.0;

 Mean = 1063.114029159023;

 StdDev = 1638.1542477604232;

 75thPercentile = 1064.75;

 95thPercentile = 1304.55;

 98thPercentile = 1504.39992;

 99thPercentile = 2307.35104;

 999thPercentile = 10491.8502;


 What does the rate signify in this context?  For example, given the
 OneMinuteRate of  675.7673129014964 and the unit of seconds--what is
 this
 measuring?  Is this the rate of which metrics are submitted? i.e., there
 were an average of (676 * 60 seconds) metrics submitted over the last
 minute?

 Thanks!

Nodetool cleanup deletes rows that aren't owned by specific tokens
(shouldn't be on this node). And nodetool repair makes sure data is in sync
between all replicas. It is wrong to say either of these commands cleanup
tombstones. Tombstones are only cleaned up during compactions only if they
are expired passed gc_grace_seconds. Now it is also incorrect to say that
compaction always cleans up tombstones. In fact there are situations that
can lead to tombstones live for a long time. SSTables are immutable, so if
the SSTables that hold tombstones aren't part of a compaction the
tombstones don't get cleaned up, so the behavior you are expecting is not
100% predictable. In case of LCS, if SStables are promoted to another
level, compaction happens and tombstones which are expired will cleanup.
Unlike SizeTiered in LCS there is no easy way to force compaction on
SSTables. One hack I have tried in the past was to stop the node and
deleted the .json file that holds level manifests. Start the node. LCS will
compact all of them again to figure out the levels. Another way is if you
pick smaller SSTable sizes, you may have more compaction churn but again it
is not 100% guarantee that the tombstones you want will be cleaned up.

On Fri, May 16, 2014 at 9:06 AM, Omar Shibli o...@eyeviewdigital.comwrote:

Yes, but still you need to run 'nodetool cleanup' from time to time to
make sure all tombstones are deleted.

On Fri, May 16, 2014 at 10:11 AM, Dimetrio dimet...@flysoft.ru wrote:

Does cassandra delete tombstones during simple LCS compaction or I should
use
node tool repair?

Thanks.

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive
at Nabble.com.

--
Cheers,
-Arya

Re: What % of cassandra developers are employed by Datastax?

Perhaps because the developers are working on DSE :-P




On Fri, May 16, 2014 at 8:13 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:

 Of the 16 active committers, 8 are not at DataStax.  See
 http://wiki.apache.org/cassandra/Committers.  That said, active
 involvement varies and there are other contributors inside DataStax and in
 the community.  You can look at the dev mailing list as well to look for
 involvement in more detail.

 On 16 May 2014, at 10:28, Janne Jalkanen janne.jalka...@ecyrd.com wrote:

 
  Don’t know, but as a potential customer of DataStax I’m also concerned
 at the fact that there does not seem to be a competitor offering Cassandra
 support and services. All innovation seems to be occurring only in the OSS
 version or DSE(*).  I’d welcome a competitor for DSE - it does not even
 have to be so well-rounded ;-)
 
  (DSE is really cool, and I think DataStax is doing awesome work. I just
 get uncomfortable when there’s a SPoF - that’s why I’m running Cassandra in
 the first place ;-)
 
  ((So yes, you, exactly you who is reading this and thinking of starting
 a company around Cassandra, pitch me when you have a product.))
 
  (((* Yes, Netflix is open sourcing a lot of Cassandra stuff, but I don’t
 think they’re planning to pivot.)))
 
  /Janne
 
  On 14 May 2014, at 23:39, Kevin Burton bur...@spinn3r.com wrote:
 
  I'm curious what % of cassandra developers are employed by Datastax?
 
  … vs other companies.
 
  When MySQL was acquired by Oracle this became a big issue because even
 though you can't really buy an Open Source project, you can acquire all the
 developers and essentially do the same thing.
 
  It would be sad if all of Cassandra's 'eggs' were in one basket and a
 similar situation happens with Datastax.
 
  Seems like they're doing an awesome job to be sure but I guess it
 worries me in the back of my mind.
 
 
 
  --
 
  Founder/CEO Spinn3r.com
  Location: San Francisco, CA
  Skype: burtonator
  blog: http://burtonator.wordpress.com
  … or check out my Google+ profile
 
  War is peace. Freedom is slavery. Ignorance is strength. Corporations
 are people.
 
 




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Tombstones

2014-05-16 Thread Omar Shibli

Yes, but still you need to run 'nodetool cleanup' from time to time to make
sure all tombstones are deleted.


On Fri, May 16, 2014 at 10:11 AM, Dimetrio dimet...@flysoft.ru wrote:

 Does cassandra delete tombstones during simple LCS compaction or I should
 use
 node tool repair?

 Thanks.



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

Re: Can Cassandra client programs use hostnames instead of IPs?

2014-05-16 Thread Huiliang Zhang

Thanks. My case is that there is no public ip and VPN cannot be set up. It
seems that I have to run EMR job to operate on the AWS cassandra cluster.

I got some timeout errors during running the EMR job as:
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at
org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:333)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144)
at
org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:228)
at
org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:213)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:658)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
at
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at
org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.createThriftClient(BulkRecordWriter.java:348)
at
org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:293)
... 12 more
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
... 15 more

Appreciated if some suggestions are provided.


On Tue, May 13, 2014 at 7:45 AM, Ben Bromhead b...@instaclustr.com wrote:

 You can set listen_address in cassandra.yaml to a hostname (
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
 ).

 Cassandra will use the IP address returned by a DNS query for that
 hostname. On AWS you don't have to assign an elastic IP, all instances will
 come with a public IP that lasts its lifetime (if you use ec2-classic or
 your VPC is set up to assign them).

 Note that whatever hostname you set in a nodes listen_address, it will
 need to return the private IP as AWS instances only have network access via
 there private address. Traffic to a instances public IP is NATed and
 forwarded to the private address. So you may as well just use the nodes IP
 address.

 If you run hadoop on instances in the same AWS region it will be able to
 access your Cassandra cluster via private IP. If you run hadoop externally
 just use the public IPs.

 If you run in a VPC without public addressing and want to connect from
 external hosts you will want to look at a VPN (
 http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html).

 Ben Bromhead
 Instaclustr | www.instaclustr.com | 
 @instaclustrhttp://twitter.com/instaclustr |
 +61 415 936 359




 On 13/05/2014, at 4:31 AM, Huiliang Zhang zhl...@gmail.com wrote:

 Hi,

 Cassandra returns ips of the nodes in the cassandra cluster for further
 communication between hadoop program and the casandra cluster. Is there a
 way to configure the cassandra cluster to return hostnames instead of ips?
 My cassandra cluster is on AWS and has no elastic ips which can be accessed
 outside AWS.

 Thanks,
 Huiliang

Re: Really need some advices on large data considerations

2014-05-16 Thread Yatong Zhang

Hi Michael, thanks for the reply,

I would RAID0 all those data drives, personally, and give up managing them
 separately. They are on multiple PCIe controllers, one drive per channel,
 right?


Raid 0 is a simple way to go but one disk failure can cause the whole
volume down, so I am afraid raid 0 won't be our choice.

I would highly suggest re-thinking about how you want to set up your data
 model and re-plan your cluster appropriately,


Our data is large but our model is simple and most of the operation is
reading by key, and we never update the data (only delete periodically).
Due to its 'dynamo' arch serving so much 'static' data on cassandra is not
a problem. What I am concerning is the 'dynamic' part, compactions, adding
/ removing nodes, data re-blancing or some thing like that.

One thing we most care is scalability and fail-over strategy and looks like
Cassandra is splendid for this: linear scalability, decentralized,
auto-partition, auto-recovery. So we choose it.


 but if you are using large blobs like image data, think about putting that
 blob data somewhere else


Any good ideas about this?

The doc you mentioned on the datastax site is great. we're still gathering
information and evaluating cassandra, and it'll be great if you have any
other suggestions!

Thanks

Best

Data modeling for Pinterest-like application

2014-05-16 Thread ziju feng

Hello,

I'm working on data modeling for a Pinterest-like project. There are
basically two main concepts: Pin and Board, just like Pinterest, where pin
is an item containing an image, description and some other information such
as a like count, and each board should contain a sorted list of Pins.

The board can be modeled with primary key (board_id, created_at, pin_id)
where created_at is used to sort the pins of the board by date. The problem
is whether I should denormalize details of pins into the board table or
just retrieve pins by page (page size can be 10~20) and then multi-get by
pin_ids to obtain details.

Since there are some boards that are accessed very often (like the home
board), denormalization seems to be a reasonable choice to enhance read
performance. However, we then have to update not only the pin table be also
each row in the board table that contains the pin whenever a pin is
updated, which sometimes could be quite frequent (such as updating the like
count). Since a pin may be contained by many boards (could be thousands),
denormalization seems to bring a lot of load on the write side as well as
application code complexity.

Any suggestion to whether our data model should go denormalized or the
normalized/multi-get way which then perhaps need a separate cached layer
for read?

Thanks,

Ziju

Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-16 Thread Ben Bromhead

If you make the timestamp the partition key you won't be able to do range 
queries (unless you use an ordered partitioner).

Assuming you are logging from multiple devices you will want your partition key 
to be the device id  the date, your clustering key to be the timestamp 
(timeuuid are good to prevent collisions) and then log message, levels etc as 
the other columns.

Then you can also create a new table for every week (or day/month depending on 
how much granularity you want) and just write to the current weeks table. This 
step allows you to delete old data without Cassandra using tombstones (you just 
drop the table for the week of logs you want to delete).

For a much clearer explantation see 
http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries (the last 
few slides).

As for compaction, I would leave it enabled as having lots of stables hanging 
around can make range queries slower (the query has more files to visit). See 
http://stackoverflow.com/questions/8917882/cassandra-sstables-and-compaction (a 
little old but still relevant). Compaction also fixes up things like merging 
row fragments (when you write new columns to the same row).


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 07/05/2014, at 10:55 AM, Kevin Burton bur...@spinn3r.com wrote:

 I'm looking at storing log data in Cassandra… 
 
 Every record is a unique timestamp for the key, and then the log line for the 
 value.
 
 I think it would be best to just disable compactions.
 
 - there will never be any deletes.
 
 - all the data will be accessed in time range (probably partitioned randomly) 
 and sequentially.
 
 So every time a memtable flushes, we will just keep that SSTable forever.  
 
 Compacting the data is kind of redundant in this situation.
 
 I was thinking the best strategy is to use setcompactionthreshold and set the 
 value VERY high to compactions are never triggered.
 
 Also, It would be IDEAL to be able to tell cassandra to just drop a full 
 SSTable so that I can truncate older data without having to do a major 
 compaction and without having to mark everything with a tombstone.  Is this 
 possible?
 
 
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.

Re: How long are expired values actually returned?

2014-05-16 Thread Sebastian Schmidt

Thank you for your answer, I really appreciate that you want to help me.
But already found out that I did something wrong in my implementation.

Am 13.05.2014 02:53, schrieb Chris Lohfink:
 That is not expected.  What client are you using and how are you setting the 
 ttls? What version of Cassandra?

 ---
 Chris Lohfink

 On May 8, 2014, at 9:44 AM, Sebastian Schmidt isib...@gmail.com wrote:

 Hi,

 I'm using the TTL feature for my application. In my tests, when using a
 TTL of 5, the inserted rows are still returned after 7 seconds, and
 after 70 seconds. Is this normal or am I doing something wrong?.

 Kind Regards,
 Sebastian





signature.asc
Description: OpenPGP digital signature

Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

Hello Anton,

What version of Cassandra are you using? If between 1.2.6 and 2.0.6 the
setInputRange(startToken, endToken) is not working.

This was fixed in 2.0.7:
https://issues.apache.org/jira/browse/CASSANDRA-6436

If you can't upgrade you can copy AbstractCFIF and CFIF to your project and
apply the patch there.

Cheers,

Paulo


On Wed, May 14, 2014 at 10:29 PM, Anton Brazhnyk anton.brazh...@genesys.com
 wrote:

 Greetings,

 I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd
 like to read just part of it - something like Spark's sample() function.
 Cassandra's API seems allow to do it with its
 ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method,
 but it doesn't work.
 The limit is just ignored and the entire column family is scanned. It
 seems this kind of feature is just not supported
 and sources of AbstractColumnFamilyInputFormat.getSplits confirm that
 (IMO).
 Questions:
 1. Am I right that there is no way to get some data limited by token range
 with ColumnFamilyInputFormat?
 2. Is there other way to limit the amount of data read from Cassandra with
 Spark and ColumnFamilyInputFormat,
 so that this amount is predictable (like 5% of entire dataset)?


 WBR,
 Anton





-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200

Tombstones on secondary indexes

2014-05-16 Thread Joel Samuelsson

My system log is full of messages like this one:

WARN [ReadStage:42] 2014-05-15 08:19:13,615 SliceQueryFilter.java (line
210) Read 0 live and 2829 tombstoned cells in
TrafficServer.rawData.rawData_evaluated_idx (see tombstone_warn_threshold)

I've run a major compaction but the tombstones are not removed.

https://issues.apache.org/jira/browse/CASSANDRA-4314 seems to say that
tombstones on secondary indexes are not removed by a compaction. Do I need
to do it manually?

Best regards,
Joel Samuelsson

Storing globally sorted data

Let's say I have an external job (MR, pig, etc) sorting a cassandra table
by some complicated mechanism.

We want to store the sorted records BACK into cassandra so that clients can
read the records sorted.

What I was just thinking of doing was storing the records as pages.

So page 0 would have records 0-999….

We would just have the key be the page ID and then the values be the
primary keys for the records so that they can be fetched. I could also
denormalize the data and store them inline as a materialized view but of
course this would require much more disk space.

Thoughts on this strategy?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Really need some advices on large data considerations

You can watch this: https://www.youtube.com/watch?v=uoggWahmWYI

 Aaron is discussing about support for big nodes




On Wed, May 14, 2014 at 3:13 AM, Yatong Zhang bluefl...@gmail.com wrote:

 Thank you Aaron, but we're planning about 20T per node, is that feasible?


 On Mon, May 12, 2014 at 4:33 PM, Aaron Morton aa...@thelastpickle.comwrote:

 We've learned that compaction strategy would be an important point cause
 we've ran into 'no space' trouble because of the 'sized tiered'  compaction
 strategy.

 If you want to get the most out of the raw disk space LCS is the way to
 go, remember it uses approximately twice the disk IO.

 From our experience changing any settings/schema during a large cluster
 is on line and has been running for some time is really really a pain.

 Which parts in particular ?

 Updating the schema or config ? OpsCentre has a rolling restart feature
 which can be handy when chef / puppet is deploying the config changes.
 Schema / gossip can take a little to propagate with high number of nodes.

 On a modern version you should be able to run 2 to 3 TB per node, maybe
 higher. The biggest concerns are going to be repair (the changes in 2.1
 will help) and bootstrapping. I’d recommend testing a smaller cluster, say
 12 nodes, with a high load per node 3TB.

 cheers
 Aaron

 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 9/05/2014, at 12:09 pm, Yatong Zhang bluefl...@gmail.com wrote:

 Hi,

 We're going to deploy a large Cassandra cluster in PB level. Our scenario
 would be:

 1. Lots of writes, about 150 writes/second at average, and about 300K
 size per write.
 2. Relatively very small reads
 3. Our data will be never updated
 4. But we will delete old data periodically to free space for new data

 We've learned that compaction strategy would be an important point cause
 we've ran into 'no space' trouble because of the 'sized tiered'  compaction
 strategy.

 We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderationsand is 
 this enough or update-to-date? From our experience changing any
 settings/schema during a large cluster is on line and has been running for
 some time is really really a pain. So we're gathering more info and
 expecting some more practical suggestions before we set up  the cassandra
 cluster.

 Thanks and any help is of great appreciation

Re: How does cassandra page through low cardinality indexes?

Hello Kevin

 For the internal working of secondary index and LIMIT, you can have a look
at this : https://issues.apache.org/jira/browse/CASSANDRA-5975

 The comments and attached patch will give you a hint on how LIMIT is
implemented. Alternatively you can look directly in the source code
starting from the modified class given in the patch


On Fri, May 16, 2014 at 7:53 PM, Kevin Burton bur...@spinn3r.com wrote:

 I'm struggling with cassandra secondary indexes since the documentation
 seems all over the place and I'm having to put together everything from
 blog posts.

 Anyway.

 If I have a low cardinality index of say 10 values, and 1M records.  This
 means each secondary index key will have references to 100,000 rows.

 How does Cassandra page through the rows when using LIMIT and paging by
 the reference?  Are the row references sorted in the index?

 Thanks!

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.

Re: Best partition type for Cassandra with JBOD

That and nobarrier… and probably noop for the scheduler if using SSD and
setting readahead to zero...


On Fri, May 16, 2014 at 10:29 AM, James Campbell 
ja...@breachintelligence.com wrote:

  Hi all—



 What partition type is best/most commonly used for a multi-disk JBOD setup
 running Cassandra on CentOS 64bit?



 The datastax production server guidelines recommend XFS for data
 partitions, saying, “Because Cassandra can use almost half your disk space
 for a single file, use XFS when using large disks, particularly if using a
 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and
 essentially unlimited on 64-bit.”



 However, the same document also notes that “Maximum recommended capacity
 for Cassandra 1.2 and later is 3 to 5TB per node,” which makes me think
 16TB file sizes would be irrelevant (especially when not using RAID to
 create a single large volume).  What has been the experience of this group?



 I also noted that the guidelines don’t mention setting noatime and
 nodiratime flags in the fstab for data volumes, but I wonder if that’s a
 common practice.

 James




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

ownership not equally distributed

2014-05-16 Thread Rameez Thonnakkal

Hello

I am having a 4 node cluster where 2 nodes are in one data center and
another 2 in a different one.

But in the first data center the token ownership is not equally
distributed. I am using vnode feature.

num_tokens is set to 256 in all nodes.
initial_number is left blank.

Datacenter: DC1

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns   Host
ID   Rack
UN  10.145.84.167  84.58 MB   256* 0.4% *
ce5ddceb-b1d4-47ac-8d85-249aa7c5e971  RAC1
UN  10.145.84.166  692.69 MB  255 44.2%
e6b5a0fd-20b7-4bf9-9a8e-715cfc823be6  RAC1
Datacenter: DC2

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns   Host
ID   Rack
UN  10.168.67.43   476 MB 256 27.8%
05dc7ea6-0328-43b8-8b70-bcea856ba41e  RAC1
UN  10.168.67.42   413.15 MB  256 27.7%
677025f0-780c-45dc-bb3b-17ad260fba7d  RAC1


done nodetool repair couple of times, but it didn't help.

In the node where less ownership there, I have seen a frequent full GC
occurring couple of times and had to restart cassandra.


Any suggestions on how to resolve this is highly appreciated.

Regards,
Rameez

Re: Best partition type for Cassandra with JBOD

2014-05-16 Thread Ariel Weisberg

Hi,



Recommending nobarrier (mount option barrier=0) when you don't know if
a non-volatile cache in play is probably not the way to go. A
non-volatile cache will typically ignore write barriers if a given
block device is configured to cache writes anyways.



I am also skeptical you will see a boost in performance. Applications
that want to defer and batch writes won't emit write barriers
frequently and when they do it's because the data has to be there.
Filesystems depend on write barriers although it is surprisingly hard
to get a reordering that is really bad because of the way journals are
managed.



Cassandra uses log structured storage and supports asynchronous
periodic group commit so it doesn't need to emit write barriers
frequently.



Setting read ahead to zero on an SSD is necessary to get the maximum
number of random reads, but will also disable prefetching for
sequential reads. You need a lot less prefetching with an SSD due to
the much faster response time, but it's still many microseconds.



Someone with more Cassandra specific knowledge can probably give better
advice as to when a non-zero read ahead make sense with Cassandra. This
is something may be workload specific as well.



Regards,

Ariel



On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote:

That and nobarrier… and probably noop for the scheduler if using SSD
and setting readahead to zero...



On Fri, May 16, 2014 at 10:29 AM, James Campbell
[1]ja...@breachintelligence.com wrote:

Hi all—


What partition type is best/most commonly used for a multi-disk JBOD
setup running Cassandra on CentOS 64bit?


The datastax production server guidelines recommend XFS for data
partitions, saying, “Because Cassandra can use almost half your disk
space for a single file, use XFS when using large disks, particularly
if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit
kernel, and essentially unlimited on 64-bit.”


However, the same document also notes that “Maximum recommended
capacity for Cassandra 1.2 and later is 3 to 5TB per node,” which makes
me think 16TB file sizes would be irrelevant (especially when not
using RAID to create a single large volume).  What has been the
experience of this group?


I also noted that the guidelines don’t mention setting noatime and
nodiratime flags in the fstab for data volumes, but I wonder if that’s
a common practice.

James




--

Founder/CEO [2]Spinn3r.com
Location: San Francisco, CA
Skype: burtonator
blog: [3]http://burtonator.wordpress.com
… or check out my [4]Google+ profile
[5][spinn3r.jpg]
War is peace. Freedom is slavery. Ignorance is strength. Corporations
are people.

References

1. mailto:ja...@breachintelligence.com
2. http://Spinn3r.com/
3. http://burtonator.wordpress.com/
4. https://plus.google.com/102718274791889610666/posts
5. http://spinn3r.com/

Best partition type for Cassandra with JBOD

2014-05-16 Thread James Campbell

Hi all-

What partition type is best/most commonly used for a multi-disk JBOD setup 
running Cassandra on CentOS 64bit?

The datastax production server guidelines recommend XFS for data partitions, 
saying, Because Cassandra can use almost half your disk space for a single 
file, use XFS when using large disks, particularly if using a 32-bit kernel. 
XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited 
on 64-bit.

However, the same document also notes that Maximum recommended capacity for 
Cassandra 1.2 and later is 3 to 5TB per node, which makes me think 16TB file 
sizes would be irrelevant (especially when not using RAID to create a single 
large volume).  What has been the experience of this group?

I also noted that the guidelines don't mention setting noatime and nodiratime 
flags in the fstab for data volumes, but I wonder if that's a common practice.

James

Questions on Leveled Compaction sizing and compaction corner cases

I was reading this
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra and
need some confirmation:

A  Sizing


*Each level is ten times as large as the previous*

In the comments:

At October 14, 2011 at 12:33
amhttp://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-18817http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-18817Jonathan
said: *L1
gets 50MB (~10 sstables of data), L2 gets 500MB/100 sstables, L3 gets 5GB*


At January 22, 2013 at 7:51
pmhttp://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-196897he
said *Remember
that within a level, data is guaranteed not to overlap across sstables*, or
put another way, *a given row will be in at most one sstable*

At February 11, 2013 at 7:32
amhttp://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-196901he
said: A compaction will run whenever there is more data in Level N 
0
than desired *(sstable_size_in_mb * 10**level)*

At February 22, 2013 at 8:20
amhttp://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-196904he
said: Leveled compaction restricts sstable size to 5MB or a single
row, *whichever
is larger*

 If I put all the info together:

1) sizeOf(Ln+1) = 10 * sizeOf(Ln) = sstable_size_in_mb * 10 ^ (n+1)
2) the size of a sstable is limited to *sstable_size_in_mb *by default* or
more if a partition is large enouth to exceed sstable_size_in_mb *
3) because of point 2), the equality ssTableCount(Ln+1) = ssTableCount(Ln)
does not always hold

Is it correct so far ?

B Compaction corner cases

Now, one of the biggest selling point of LCS is its frequent compaction and
that 90% of the read only touch 1 SSTable. Fine.

 Let's suppose we have data in 4 levels (taking the new default 160M
for *sstable_size_in_mb)
* : L0,
L1 (1.6Gb),
L2 (16Gb),
L3 (160Gb partially filled).

 For some reason there was a burst in write in the application so data gets
compacted up to L3. Now that the write/update workload is back to normal,
compaction never goes beyond L2.

 In this case, all my old/deleted/obsolete data in L3 will never be
compacted isn't it ? Or only at the next burst in write right ?

Regards

 Duy Hai DOAN

Re: Storing globally sorted data

What you show is basically the idea of bucketing data. One bucket = one
physical partition. Within each bucket, there is a fixed number of column
(1000 in your example).

 This strategy works fine and avoid too large partition. The only draw back
I would see is the need to fetch data over buckets but it seems that in
your case you fetch data by partition so it should be ok.

 About denormalizing, it's the way to go. Disk space is sometimes cheaper
that the high read latency caused by normalized data model.




On Fri, May 16, 2014 at 8:41 PM, Kevin Burton bur...@spinn3r.com wrote:

 Let's say I have an external job (MR, pig, etc) sorting a cassandra table
 by some complicated mechanism.

 We want to store the sorted records BACK into cassandra so that clients
 can read the records sorted.

 What I was just thinking of doing was storing the records as pages.

 So page 0 would have records 0-999….

 We would just have the key be the page ID and then the values be the
 primary keys for the records so that they can be fetched. I could also
 denormalize the data and store them inline as a materialized view but of
 course this would require much more disk space.

 Thoughts on this strategy?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.

Re: Data modeling for Pinterest-like application

The problem is whether I should denormalize details of pins into the board
table or just retrieve pins by page (page size can be 10~20) and then
multi-get by pin_ids to obtain details

-- Denormalize is the best way to go in your case. Otherwise, for 1 board
read, you'll have 10-20 subsequent reads to load the pins. Multiply it by
the number of users listing boards and you'll be quickly in trouble...

 For the update of pins like count, you'll need to use counter type.

 denormalization seems to bring a lot of load on the write side as well as
application code complexity -- first C* copes quite well with write load.
Second, you should ask yourself: how often is the update scenario vs the
read scenario ? Usually the read pattern is predominant.

 About update code complexity it's the price to pay for read performance.
CQRS pattern will help you to separate the write and read stages, also
heavy unit and integration testing.



On Fri, May 16, 2014 at 5:14 AM, ziju feng pkdog...@gmail.com wrote:

 Hello,

 I'm working on data modeling for a Pinterest-like project. There are
 basically two main concepts: Pin and Board, just like Pinterest, where pin
 is an item containing an image, description and some other information such
 as a like count, and each board should contain a sorted list of Pins.

 The board can be modeled with primary key (board_id, created_at, pin_id)
 where created_at is used to sort the pins of the board by date. The problem
 is whether I should denormalize details of pins into the board table or
 just retrieve pins by page (page size can be 10~20) and then multi-get by
 pin_ids to obtain details.

 Since there are some boards that are accessed very often (like the home
 board), denormalization seems to be a reasonable choice to enhance read
 performance. However, we then have to update not only the pin table be also
 each row in the board table that contains the pin whenever a pin is
 updated, which sometimes could be quite frequent (such as updating the like
 count). Since a pin may be contained by many boards (could be thousands),
 denormalization seems to bring a lot of load on the write side as well as
 application code complexity.

 Any suggestion to whether our data model should go denormalized or the
 normalized/multi-get way which then perhaps need a separate cached layer
 for read?

 Thanks,

 Ziju

Re: Backup procedure

2014-05-16 Thread Chris Burroughs

It's also good to note that only the Data files are compressed already. 
 Depending on your data the Index and other files may be a significant 
percent of total on disk data.


On 05/02/2014 01:14 PM, tommaso barbugli wrote:

In my tests compressing with lzop sstables (with cassandra compression
turned on) resulted in approx. 50% smaller files.
Thats probably because the chunks of data compressed by lzop are way bigger
than the average size of writes performed on Cassandra (not sure how data
is compressed but I guess it is done per single cell so unless one stores)


2014-05-02 19:01 GMT+02:00 Robert Coli rc...@eventbrite.com:


On Fri, May 2, 2014 at 2:07 AM, tommaso barbugli tbarbu...@gmail.comwrote:


If you are thinking about using Amazon S3 storage I wrote a tool that
performs snapshots and backups on multiple nodes.
Backups are stored compressed on S3.
https://github.com/tbarbugli/cassandra_snapshotter



https://github.com/JeremyGrosser/tablesnap

SSTables in Cassandra are compressed by default, if you are re-compressing
them you may just be wasting CPU.. :)

=Rob

Number of rows under one partition key

2014-05-16 Thread Vegard Berget

Hi,
I know this has been discussed before, and I know there are
limitations to how many rows one partition key in practice can handle.
 But I am not sure if number of rows or total data is the deciding
factor.  I know the thrift interface well, but this is my first
project where we are actively using cql, so this is also new for me.
The case is like this:We have a partition key, clientid, which have a
cluster key (id).Number of rows with the same clientid normally is
between 10 000 and 100 000 I would guess.  The data is pretty small,
let's say 200 bytes per row in average (probably even smaller, but for
the example let's assume 200 bytes).It _CAN_ however be more rows with
the same clientid in some edge cases, I would guess up to 1 000
000. Most of the time we read with both id and clientid, or we read
for example 1000 rows with just clientid.  It would be nice to be
able to fetch all rows in one query, if possible.Number of reads
versus number of writes is about 100 to 1.  Of that I guess that
updates versus inserts is about 1:4.  Deletes are rare.Currently the
production environment is Cassandra 1.2.11, but we are testing this on
Cassandra 2.0.something in our development environment.
Questions:Should we add another partition key to avoid 1 000 000 rows
in the same thrift-row (which is how I understand it is actually
stored)?  Or is 1 000 000 rows okay?  If we add a bucketid-ish
thing to the partition key, how should we do queries most
effectively?Since reading is the most important, and writing and space
is not an issue, should we have a high number of replications and read
from (relatively) few nodes?  When it comes to consistency, it isn't
a problem waiting for everything to be replicated to responsive nodes
(within some ms or even seconds), but if a node goes down and contains
very old data (multiple minutes, hours or days) - that would be a
problem, atleast if it happened regulary..  What, in practice, is the
cost of reading with a high number of nodes in the consistency level.
   Does replicate to 4 nodes, read from 2 sound like an ok option
here (avoiding full consistency, but at the same time if one node
crashes and comes up with old data we still would get a pretty
consistent result.  The probability of 2 of the nodes crashing at the
same time is low, and _maybe_ something we can live with in this
specific case)?
Other considerations, for example compaction strategy and if we should
do an upgrade to 2.0 because of this (we will upgrade anyway, but if
it is recommended we will continue to use 2.0 in development and
upgrade the production environment sooner)
I have done some testing, inserting a million rows and selecting them
all, counting them and selecting individual rows (with both clientid
and id) and it seems fine, but I want to ask to be sure that I am on
the right track.  
Best regards,Vegard Berget

Re: Couter column family performance problems

2014-05-16 Thread Robert Coli

On Mon, May 12, 2014 at 3:03 PM, Batranut Bogdan batra...@yahoo.com wrote:

 I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter,
 c counter, d counter



 Feel free to comment and share experiences about counter CF performance.


Briefly :

1) Counters original version are slow and somewhat fragile.
2) Counters have been upgraded in 2.1 to be decent speed and less fragile.
3) Probably neither should be used in extreme volume cases (rate of update)
or when high counter accuracy is required.

https://issues.apache.org/jira/browse/CASSANDRA-6504

=Rob

Re: Tombstones

2014-05-16 Thread Keith Wright

Note that Cassandra will not compact away some tombstones if you have differing 
column TTLs.  See the following jira and resolution I filed for this: 
https://issues.apache.org/jira/browse/CASSANDRA-6654

On May 16, 2014 4:49 PM, Chris Lohfink clohf...@blackbirdit.com wrote:
It will delete them after gc_grace_seconds (set per table) and a compaction.

---
Chris Lohfink

On May 16, 2014, at 9:11 AM, Dimetrio dimet...@flysoft.ru wrote:

 Does cassandra delete tombstones during simple LCS compaction or I should use
 node tool repair?

 Thanks.



 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.

Re: Failed to mkdirs $HOME/.cassandra

2014-05-16 Thread Dave Brosius

 

For now you can edit the nodetool script itself by adding 

-Duser.home=/tmp 

as in 

$JAVA $JAVA_AGENT -cp $CLASSPATH 
 -Xmx32m 
 -Duser.home=/tmp 
 -Dlogback.configurationFile=logback-tools.xml 
 -Dstorage-config=$CASSANDRA_CONF 
 org.apache.cassandra.tools.NodeTool -p $JMX_PORT $ARGS

if you like you can add an issue to jira. 

On 2014-05-09 18:42, Bryan Talbot wrote: 

 How should nodetool command be run as the user nobody? 
 
 The nodetool command fails with an exception if it cannot create a .cassandra 
 directory in the current user's home directory. 
 
 I'd like to schedule some nodetool commands to run with least privilege as 
 cron jobs. I'd like to run them as the nobody user -- which typically has 
 / as the home directory -- since that's what the user is typically used for 
 (minimum privileges). 
 
 None of the methods described in this JIRA actually seem to work (with 2.0.7 
 anyway) https://issues.apache.org/jira/browse/CASSANDRA-6475 [1] 
 
 Testing as a normal user with no write permissions to the home directory (to 
 simulate the nobody user) 
 
 [vagrant@local-dev ~]$ nodetool version 
 ReleaseVersion: 2.0.7 
 [vagrant@local-dev ~]$ rm -rf .cassandra/ 
 [vagrant@local-dev ~]$ chmod a-w . 
 
 [vagrant@local-dev ~]$ nodetool flush my_ks my_cf 
 Exception in thread main FSWriteError in /home/vagrant/.cassandra 
 at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) 
 at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690)
  
 at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) 
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) 
 Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra 
 ... 4 more 
 
 [vagrant@local-dev ~]$ HOME=/tmp nodetool flush my_ks my_cf 
 Exception in thread main FSWriteError in /home/vagrant/.cassandra 
 at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) 
 at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690)
  
 at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) 
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) 
 Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra 
 ... 4 more 
 
 [vagrant@local-dev ~]$ env HOME=/tmp nodetool flush my_ks my_cf 
 Exception in thread main FSWriteError in /home/vagrant/.cassandra 
 at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) 
 at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690)
  
 at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) 
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) 
 Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra 
 ... 4 more 
 
 [vagrant@local-dev ~]$ env user.home=/tmp nodetool flush my_ks my_cf 
 Exception in thread main FSWriteError in /home/vagrant/.cassandra 
 at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) 
 at 
 org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690)
  
 at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) 
 at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) 
 Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra 
 ... 4 more 
 
 [vagrant@local-dev ~]$ nodetool -Duser.home=/tmp flush my_ks my_cf 
 Unrecognized option: -Duser.home=/tmp 
 usage: java org.apache.cassandra.tools.NodeCmd --host arg command 
 ...
 

Links:
--
[1] https://issues.apache.org/jira/browse/CASSANDRA-6475

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Jack Krupansky

You can always check the project committer wiki:
http://wiki.apache.org/cassandra/Committers

-- Jack Krupansky

From: Kevin Burton 
Sent: Wednesday, May 14, 2014 4:39 PM
To: user@cassandra.apache.org 
Subject: What % of cassandra developers are employed by Datastax?

I'm curious what % of cassandra developers are employed by Datastax?

… vs other companies.

When MySQL was acquired by Oracle this became a big issue because even though 
you can't really buy an Open Source project, you can acquire all the developers 
and essentially do the same thing.

It would be sad if all of Cassandra's 'eggs' were in one basket and a similar 
situation happens with Datastax.

Seems like they're doing an awesome job to be sure but I guess it worries me in 
the back of my mind.

-- 

Founder/CEO Spinn3r.com

Location: San Francisco, CA
Skype: burtonator
blog: http://burtonator.wordpress.com
… or check out my Google+ profile

War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
people.

Query first 1 columns for each partitioning keys in CQL?

2014-05-16 Thread Matope Ono

Hi, I'm modeling some queries in CQL3.

I'd like to query first 1 columns for each partitioning keys in CQL3.

For example:

create table posts(
 author ascii,
 created_at timeuuid,
 entry text,
 primary key(author,created_at)
 );
 insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 john');
 insert into posts(author,created_at,entry) values
 ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john');
 insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by
 mike');
 insert into posts(author,created_at,entry) values
 ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike');


And I want results like below.

mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike
 john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john


I think that this is what SELECT FIRST  statements did in CQL2.

The only way I came across in CQL3 is retrieve whole records and drop
manually,
but it's obviously not efficient.

Could you please tell me more straightforward way in CQL3?

Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-16 Thread Clint Kelly

Hi Anton,

One approach you could look at is to write a custom InputFormat that
allows you to limit the token range of rows that you fetch (if the
AbstractColumnFamilyInputFormat does not do what you want).  Doing so
is not too much work.

If you look at the class RowIterator within CqlRecordReader, you can
see code in the constructor that creates a query with a certain token
range:

ResultSet rs = session.execute(cqlQuery,
type.compose(type.fromString(split.getStartToken())),
type.compose(type.fromString(split.getEndToken())) );

 I think you can make a new version of the InputFormat and just tweak
this method to achieve what you want.  Alternatively, if you just want
to get a sample of the data, you might want to change the InputFormat
itself such that it chooses to query only a subset of the total input
splits (or CfSplits).  That might be easier.

Best regards,
Clint

On Wed, May 14, 2014 at 6:29 PM, Anton Brazhnyk
anton.brazh...@genesys.com wrote:
 Greetings,

 I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd 
 like to read just part of it - something like Spark's sample() function.
 Cassandra's API seems allow to do it with its 
 ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, 
 but it doesn't work.
 The limit is just ignored and the entire column family is scanned. It seems 
 this kind of feature is just not supported
 and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO).
 Questions:
 1. Am I right that there is no way to get some data limited by token range 
 with ColumnFamilyInputFormat?
 2. Is there other way to limit the amount of data read from Cassandra with 
 Spark and ColumnFamilyInputFormat,
 so that this amount is predictable (like 5% of entire dataset)?


 WBR,
 Anton

How does cassandra page through low cardinality indexes?

I'm struggling with cassandra secondary indexes since the documentation
seems all over the place and I'm having to put together everything from
blog posts.

Anyway.

If I have a low cardinality index of say 10 values, and 1M records.  This
means each secondary index key will have references to 100,000 rows.

How does Cassandra page through the rows when using LIMIT and paging by the
reference?  Are the row references sorted in the index?

Thanks!

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Tombstones

It will delete them after gc_grace_seconds (set per table) and a compaction.  

---
Chris Lohfink

On May 16, 2014, at 9:11 AM, Dimetrio dimet...@flysoft.ru wrote:

 Does cassandra delete tombstones during simple LCS compaction or I should use
 node tool repair?
 
 Thanks.
 
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.

Re: What % of cassandra developers are employed by Datastax?

There does seem to be some effort trying to encourage others - DataStax had 
some talks explaining how to contribute.  This year there is even a extra 
bootcamp
http://learn.datastax.com/CassandraSummitBootcampApplication.html


On May 16, 2014, at 9:47 AM, Peter Lin wool...@gmail.com wrote:

 
 perhaps the committers should invite other developers that have shown an 
 interest in contributing to Cassandra.
 
 the rate of adding new non-Datastax committers appears to be low the last 2 
 years. I have no data to support it, it's just a feeling based personal 
 observations the last 3 years.

Re: Storing log structured data in Cassandra without compactions for performance boost.



 If the data is read from a slice of a partition that has been added over
 time there will be a part of that row in every almost sstable. That would
 mean all of them (multiple disk seeks depending on clustering order per
 sstable) would have to be read from in order to service the query.  Data
 model can help or hurt a lot though.


Yes… totally agree, but we wouldn't do that.  The entire 'row' is immutable
and passes through the system and then expires due to TTL.

TTL is probably the way to go here, especially if Cassandra just drops the
whole SSTable on the TTL expiration which is what I think Im hearing.


 If you set the TTL for the columns you added then C* will clean up
 sstables (if size tiered and post 1.2) once the datas been expired.  Since
 you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt
 result in tombstones.


Thanks!

Kevin
-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profilehttps://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Colin

I used cassandra for years at NYSE and we were able to do what we wanted with 
cassandra by leveraging open source and internal development knowing that 
cassandra did what we wanted it to do and that no one could ever take the code 
away from us in a worst case scenario.

Compare and contrast that with the pure proprietary model, and I'm sure it will 
help you sleep easier.

--
Colin Clark 
+1-320-221-9531
 

 On May 15, 2014, at 10:52 AM, Jack Krupansky j...@basetechnology.com 
 wrote:
 
 You can always check the project committer wiki:
 http://wiki.apache.org/cassandra/Committers
  
 -- Jack Krupansky
  
 From: Kevin Burton
 Sent: Wednesday, May 14, 2014 4:39 PM
 To: user@cassandra.apache.org
 Subject: What % of cassandra developers are employed by Datastax?
  
 I'm curious what % of cassandra developers are employed by Datastax?
  
 … vs other companies.
  
 When MySQL was acquired by Oracle this became a big issue because even though 
 you can't really buy an Open Source project, you can acquire all the 
 developers and essentially do the same thing.
  
 It would be sad if all of Cassandra's 'eggs' were in one basket and a similar 
 situation happens with Datastax.
  
 Seems like they're doing an awesome job to be sure but I guess it worries me 
 in the back of my mind.
  
  
  
 -- 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.

Re: Data modeling for Pinterest-like application

2014-05-16 Thread ziju feng

Thanks for your answer, I really like the frequency of update vs read way of
thinking. 

A related question is whether it is a good idea to denormalize on read-heavy
part of data while normalize on other less frequently-accessed data? 

Our app will have a limited number of system managed boards that are viewed
by every user so it makes sense to denormalize and propagate updates of pins
to these boards. 

We will also have a like board for each user containing pins that they like,
which can be somewhat private and only viewed by the owner. 

Since a pin can be potentially liked by thousands of user, if we also
denormalize the like board, everytime that pin is liked by another user we
would have to update the like count in thousands of like boards. 

Does normalize work better in this case or cassandra can handle this kind of
write load?



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594517.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.

Re: Multi-dc cassandra keyspace

2014-05-16 Thread Tupshin Harper

It's often an excellent strategy.  No known issues.

-Tupshin
On May 16, 2014 4:13 PM, Anand Somani meatfor...@gmail.com wrote:

 Hi,

 It seems like it should be possible to have a keyspace replicated only to
 a subset of DC's on a given cluster spanning across multiple DCs? Is there
 anything bad about this approach?

 Scenario
 Cluster spanning 4 DC's = CA, TX, NY, UT
 Has multiple keyspaces such that
 * keyspace_CA_TX - replication_strategy = {CA = 3, TX = 3}
 * keyspace_UT_NY - replication_strategy = {UT = 3, NY = 3}
 * keyspace_CA_UT - replication_strategy = {UT = 3, CA = 3}

 I am going to try this out, but was curious if anybody out there has tried
 it.

 Thanks
 Anand

Re: Mutation messages dropped

2014-05-16 Thread Mark Reddy

Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for
further details.


Mark


On Fri, May 9, 2014 at 12:52 PM, Raveendran, Varsha IN BLR STS 
varsha.raveend...@siemens.com wrote:

  Hello,

 I am writing around 10Million records continuously into a single node
 Cassandra (2.0.5) .
 In the Cassandra log file I see an entry “*272 MUTATION messages dropped
 in last 5000ms*” . Does this mean that 272 records were not written
 successfully?

 Thanks,
 Varsha

Index with same Name but different keyspace

2014-05-16 Thread mahesh rajamani

Hi,

I am using Cassandra 2.0.5 version. I trying to setup 2 keyspace with same
tables for different testing. While creating index on the tables, I
realized I am not able to use the same index name  though the tables are in
different keyspaces. Is maintaining unique index name across keyspace is
must/feature?

-- 
Regards,
Mahesh Rajamani

Re: Couter column family performance problems

What version are you using?  and what consistency level are you using for your 
inserts?  A CL.ONE for instance can end up with a large backup in the 
replicateOnWrite (or CounterMutation depending on version) stage since it 
happens outside the feedback loop from the request and can be a little slow.  
if it shows large pending/blocked in nodetool tpstats might be overrunning 
your capacity.

---
Chris Lohfink

On May 12, 2014, at 5:03 PM, Batranut Bogdan batra...@yahoo.com wrote:

 Hello all,
 
 I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c 
 counter, d counter
 After inserting a few million keys... 55 mil, the performance goes down the 
 drain, 2-3 nodes in the cluster are on medium load, and when inserting 
 batches of same lengths writes take longer and longer until the whole cluster 
 becomes loaded and I get a lot of TExceptions... and the cluster becomes 
 unresponsive.
 
 Did anyone have the same problem?
 Feel free to comment and share experiences about counter CF performance.

Running Production Cluster at Rackspace

2014-05-16 Thread Jan Algermissen

Hi,

can anyone point me to recommendations for hosting and configuration 
requirements when running a Production Cassandra Cluster at Rackspace?

Are there reference projects that document the suitability of Rackspace for 
running a production Cassandra cluster?


Jan

RE: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-16 Thread Anton Brazhnyk

Hi Paulo,

I’m using C* 1.2.15 and have no easy option to upgrade (at least not to 2.0.* 
branch).
I’ve started to look if I can implement my variant of InputFormat.
Thanks a lot for the hint, I’m for sure will check how it’s done in 2.0.6 and 
if it’s possible to backport it to 1.2.* branch.


WBR,
Anton

From: Paulo Ricardo Motta Gomes [mailto:paulo.mo...@chaordicsystems.com]
Sent: Thursday, May 15, 2014 3:21 AM
To: user@cassandra.apache.org
Subject: Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

Hello Anton,

What version of Cassandra are you using? If between 1.2.6 and 2.0.6 the 
setInputRange(startToken, endToken) is not working.

This was fixed in 2.0.7: https://issues.apache.org/jira/browse/CASSANDRA-6436

If you can't upgrade you can copy AbstractCFIF and CFIF to your project and 
apply the patch there.

Cheers,

Paulo

On Wed, May 14, 2014 at 10:29 PM, Anton Brazhnyk 
anton.brazh...@genesys.commailto:anton.brazh...@genesys.com wrote:
Greetings,

I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd like 
to read just part of it - something like Spark's sample() function.
Cassandra's API seems allow to do it with its 
ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, but 
it doesn't work.
The limit is just ignored and the entire column family is scanned. It seems 
this kind of feature is just not supported
and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO).
Questions:
1. Am I right that there is no way to get some data limited by token range with 
ColumnFamilyInputFormat?
2. Is there other way to limit the amount of data read from Cassandra with 
Spark and ColumnFamilyInputFormat,
so that this amount is predictable (like 5% of entire dataset)?


WBR,
Anton




--
Paulo Motta

Chaordic | Platform
www.chaordic.com.brhttp://www.chaordic.com.br/
+55 48 3232.3200

null date bug? Not sure if its cassandra 2.0.5 or the gocql (golang) driver.

2014-05-16 Thread Jacob Rhoden

Im noticing the following strange behaviour when I do a query on a table:

cqlsh:mykeyspace select uuid, discontinued_from from mytable;

 uuid | discontinued_from
--+--
 b838a632-dd61-11e3-a32e-b8f6b11b1965 |  -6795364578.871
 b838e9b4-dd61-11e3-a330-b8f6b11b1965 |  -6795364578.871
 b838c725-dd61-11e3-a32f-b8f6b11b1965 |  -6795364578.871
 b8390aeb-dd61-11e3-a331-b8f6b11b1965 | 2014-01-01 10:00:00+1100
 b83840b7-dd61-11e3-a32c-b8f6b11b1965 |  -6795364578.871
 b83882fc-dd61-11e3-a32d-b8f6b11b1965 |  -6795364578.871

(6 rows)

Failed to format value -6795364578.871 as timestamp: year out of range
Failed to format value -6795364578.871 as timestamp: year out of range
3 more decoding errors suppressed.

The discontinued from field is being updated with a golang time.Time variable 
that is either correctly initialised, or left as null, i.e.:

err := cql.Query(update mytable set name=?, discontinued_from, updated=? where 
uuid=?, name, discontinuedFrom, time.Now().UnixNano(), s.Id).Exec()

I would have expected updating a timestamp with null value should result in a 
null in the row. Is this a bug in gocql? Or am I misunderstanding how null can 
be used? (Is it not possible or allowed to set something to null??)

Thanks,
Jacob

Clustering order and secondary index

2014-05-16 Thread cbert...@libero.it

Hi all,
I'm trying to migrate my old project born with Cassandra 0.6 and grown with 0.7
/1.0 to the latest 2.0.
I have an easy question for you all: query using only secondary indexes do not 
respect any clustering order?

Thanks

Re: Efficient bulk range deletions without compactions by dropping SSTables.

Hello Kevin,

In 2.0.X an SSTable is automatically dropped if it contains only
tombstones: https://issues.apache.org/jira/browse/CASSANDRA-5228. However
this will most likely happen if you use LCS. STCS will create sstables of
larger size that will probably have mixed expired and unexpired data.  This
could be solved by the single-sstable tombstone compaction that
unfortunately is not working well (
https://issues.apache.org/jira/browse/CASSANDRA-6563).

I don't know of a way to manually drop specific sstables safely, you could
try implementing a script that compares sstables timestamps to check if an
sstable is safely droppable as done in CASSANDRA-5228. There are proposals
to create a compaction strategy optimized for log only data that only
deletes old sstables but it's not ready yet AFAIK.

Cheers,

Paulo

On Mon, May 12, 2014 at 8:53 PM, Kevin Burton bur...@spinn3r.com wrote:

 We have a log only data structure… everything is appended and nothing is
 ever updated.

 We should be totally fine with having lots of SSTables sitting on disk
 because even if we did a major compaction the data would still look the
 same.

 By 'lots' I mean maybe 1000 max.  Maybe 1GB each.

 However, I would like a way to delete older data.

 One way to solve this could be to just drop an entire SSTable if all the
 records inside have tombstones.

 Is this possible, to just drop a specific SSTable?

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 Skype: *burtonator*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ 
 profilehttps://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are
 people.




-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200

Re: What % of cassandra developers are employed by Datastax?