Re: What does the rate signify for latency in the JMX Metrics?
They are exponential decaying moving averages (like Unix load averages) of the number of events per unit of time. http://wiki.apache.org/cassandra/Metrics might help On 04/17/2014 06:06 PM, Redmumba wrote: Good afternoon, I'm attempting to integrate the metrics generated via JMX into our internal framework; however, the information for several of the metrics includes a One/Five/Fifteen-minute rate, with the RateUnit in SECONDS. For example: $get -b org.apache.cassandra.metrics:name=Latency,scope=Write,type=ClientRequest * #mbean = org.apache.cassandra.metrics:name=Latency,scope=Write,type=ClientRequest: LatencyUnit = MICROSECONDS; EventType = calls; RateUnit = SECONDS; MeanRate = 383.6944837362387; FifteenMinuteRate = 868.8420188648543; FiveMinuteRate = 817.5239450236011; OneMinuteRate = 675.7673129014964; Max = 498867.0; Count = 31257426; Min = 52.0; 50thPercentile = 926.0; Mean = 1063.114029159023; StdDev = 1638.1542477604232; 75thPercentile = 1064.75; 95thPercentile = 1304.55; 98thPercentile = 1504.39992; 99thPercentile = 2307.35104; 999thPercentile = 10491.8502; What does the rate signify in this context? For example, given the OneMinuteRate of 675.7673129014964 and the unit of seconds--what is this measuring? Is this the rate of which metrics are submitted? i.e., there were an average of (676 * 60 seconds) metrics submitted over the last minute? Thanks!
Re: What does the rate signify for latency in the JMX Metrics?
What does the rate signify in this context? For example, given the OneMinuteRate of 675.7673129014964 and the unit of seconds--what is this measuring? means that there were 675 write requests per second over the last one minute. As Other Chris (tm) mentioned this is exp decaying reservoir. Uses an exponentially decaying sample of 1028 elements, which offers a 99.9% confidence level with a 5% margin of error assuming a normal distribution, and an alpha factor of 0.015 which heavily biases the sample to the past 5 minutes of measurements. http://dimacs.rutgers.edu/~graham/pubs/papers/fwddecay.pdf --- Chris Lohfink On May 7, 2014, at 1:00 PM, Chris Burroughs chris.burrou...@gmail.com wrote: They are exponential decaying moving averages (like Unix load averages) of the number of events per unit of time. http://wiki.apache.org/cassandra/Metrics might help On 04/17/2014 06:06 PM, Redmumba wrote: Good afternoon, I'm attempting to integrate the metrics generated via JMX into our internal framework; however, the information for several of the metrics includes a One/Five/Fifteen-minute rate, with the RateUnit in SECONDS. For example: $get -b org.apache.cassandra.metrics:name=Latency,scope=Write,type=ClientRequest * #mbean = org.apache.cassandra.metrics:name=Latency,scope=Write,type=ClientRequest: LatencyUnit = MICROSECONDS; EventType = calls; RateUnit = SECONDS; MeanRate = 383.6944837362387; FifteenMinuteRate = 868.8420188648543; FiveMinuteRate = 817.5239450236011; OneMinuteRate = 675.7673129014964; Max = 498867.0; Count = 31257426; Min = 52.0; 50thPercentile = 926.0; Mean = 1063.114029159023; StdDev = 1638.1542477604232; 75thPercentile = 1064.75; 95thPercentile = 1304.55; 98thPercentile = 1504.39992; 99thPercentile = 2307.35104; 999thPercentile = 10491.8502; What does the rate signify in this context? For example, given the OneMinuteRate of 675.7673129014964 and the unit of seconds--what is this measuring? Is this the rate of which metrics are submitted? i.e., there were an average of (676 * 60 seconds) metrics submitted over the last minute? Thanks!
Re: clearing tombstones?
I tried to do this, however the doubling in disk space is not temporary as you state in your note. What am I missing? On Fri, Apr 11, 2014 at 10:44 AM, William Oberman ober...@civicscience.comwrote: So, if I was impatient and just wanted to make this happen now, I could: 1.) Change GCGraceSeconds of the CF to 0 2.) run nodetool compact (*) 3.) Change GCGraceSeconds of the CF back to 10 days Since I have ~900M tombstones, even if I miss a few due to impatience, I don't care *that* much as I could re-run my clean up tool against the now much smaller CF. (*) A long long time ago I seem to recall reading advice about don't ever run nodetool compact, but I can't remember why. Is there any bad long term consequence? Short term there are several: -a heavy operation -temporary 2x disk space -one big SSTable afterwards But moving forward, everything is ok right? CommitLog/MemTable-SStables, minor compactions that merge SSTables, etc... The only flaw I can think of is it will take forever until the SSTable minor compactions build up enough to consider including the big SSTable in a compaction, making it likely I'll have to self manage compactions. On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy mark.re...@boxever.comwrote: Correct, a tombstone will only be removed after gc_grace period has elapsed. The default value is set to 10 days which allows a great deal of time for consistency to be achieved prior to deletion. If you are operationally confident that you can achieve consistency via anti-entropy repairs within a shorter period you can always reduce that 10 day interval. Mark On Fri, Apr 11, 2014 at 3:16 PM, William Oberman ober...@civicscience.com wrote: I'm seeing a lot of articles about a dependency between removing tombstones and GCGraceSeconds, which might be my problem (I just checked, and this CF has GCGraceSeconds of 10 days). On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli tbarbu...@gmail.comwrote: compaction should take care of it; for me it never worked so I run nodetool compaction on every node; that does it. 2014-04-11 16:05 GMT+02:00 William Oberman ober...@civicscience.com: I'm wondering what will clear tombstoned rows? nodetool cleanup, nodetool repair, or time (as in just wait)? I had a CF that was more or less storing session information. After some time, we decided that one piece of this information was pointless to track (and was 90%+ of the columns, and in 99% of those cases was ALL columns for a row). I wrote a process to remove all of those columns (which again in a vast majority of cases had the effect of removing the whole row). This CF had ~1 billion rows, so I expect to be left with ~100m rows. After I did this mass delete, everything was the same size on disk (which I expected, knowing how tombstoning works). It wasn't 100% clear to me what to poke to cause compactions to clear the tombstones. First I tried nodetool cleanup on a candidate node. But, afterwards the disk usage was the same. Then I tried nodetool repair on that same node. But again, disk usage is still the same. The CF has no snapshots. So, am I misunderstanding something? Is there another operation to try? Do I have to just wait? I've only done cleanup/repair on one node. Do I have to run one or the other over all nodes to clear tombstones? Cassandra 1.2.15 if it matters, Thanks! will
Multi-dc cassandra keyspace
Hi, It seems like it should be possible to have a keyspace replicated only to a subset of DC's on a given cluster spanning across multiple DCs? Is there anything bad about this approach? Scenario Cluster spanning 4 DC's = CA, TX, NY, UT Has multiple keyspaces such that * keyspace_CA_TX - replication_strategy = {CA = 3, TX = 3} * keyspace_UT_NY - replication_strategy = {UT = 3, NY = 3} * keyspace_CA_UT - replication_strategy = {UT = 3, CA = 3} I am going to try this out, but was curious if anybody out there has tried it. Thanks Anand
Erase old sstables to make room for new sstables
In the system we're using, we have a large fleet of servers constantly appending time-based data to our database--it's largely writes, very few reads (it's auditing data). However, our cluster max space is around 80TB, and we'd like to maximize how much data we can retain. One option is to delete all old records, or to set a TTL, but that requires a substantial clean-up process that we could easily avoid if we were able to just flat-out drop the oldest sstables. I.e., when we get to 90% disk space, drop the oldest sstable. Obviously, the oldest sstable on one may not be the same as the oldest sstable on another, but since this is the oldest data, that is an acceptable inconsistency. Is this possible to do safely? The data in the oldest sstable is always guaranteed to be the oldest data, so that is not my concern--my main concern is whether or not we can even do this, and also how we can notify Cassandra that an sstable has been removed underneath it. tl;dr: Can I routinely remove the oldest sstable to free up disk space, without causing stability drops in Cassandra? Thanks for your feedback! Andrew
Tombstones
Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Disable reads during node rebuild
That'll be really useful, thanks!! On Wed, May 14, 2014 at 7:47 PM, Aaron Morton aa...@thelastpickle.comwrote: As of 2.0.7, driftx has added this long-requested feature. Thanks A - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 13/05/2014, at 9:36 am, Robert Coli rc...@eventbrite.com wrote: On Mon, May 12, 2014 at 10:18 AM, Paulo Ricardo Motta Gomes paulo.mo...@chaordicsystems.com wrote: Is there a way to disable reads from a node while performing rebuild from another datacenter? I tried starting the node in write survery mode, but the nodetool rebuild command does not work in this mode. As of 2.0.7, driftx has added this long-requested feature. https://issues.apache.org/jira/browse/CASSANDRA-6961 Note that it is impossible to completely close the race window here as long as writes are incoming, this functionality just dramatically shortens it. =Rob -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200
Re: Efficient bulk range deletions without compactions by dropping SSTables.
Just a few data points from our experience One of our use cases involves storing a periodic full base state for millions of records, then fairly frequent delta updates to subsets of the records in between. C* is great for this because we can read the whole row (or up to the clustering key/column marking “now” as perceived by the client) and munge the base + deltas together in the client. To keep rows small (and for recovery), we start over in a new CF whenever we start a new base state The upshot is that we have pretty much the same scenario as Jeremy is describing For this use case we are also using Astyanax (but C* 2.0.5) We have not come across many of the schema problems you mention (which is likely accountable to some changes in the 2.0.x line), however one thing to note is that Astyanax itself seems to be very picky about un-resolved schema changes. We found that we had to do the schema changes via a CQL “create table” (we can still use Astyanax for that) rather than creating it via old style thrift CF creation On May 13, 2014, at 9:42 AM, Jeremy Powell jeremym.pow...@gmail.com wrote: Hi Kevin, C* version: 1.2.xx Astyanax: 1.56.xx We basically do this same thing in one of our production clusters, but rather than dropping SSTables, we drop Column Families. We time-bucket our CFs, and when a CF has passed some time threshold (metadata or embedded in CF name), it is dropped. This means there is a home-grown system that is doing the bookkeeping/maintenance rather than relying on C*s inner workings. It is unfortunate that we have to maintain a system which maintains CFs, but we've been in a pretty good state for the last 12 months using this method. Some caveats: By default, C* makes snapshots of your data when a table is dropped. You can leave that and have something else clear up the snapshots, or if you're less paranoid, set auto_snapshot: false in the cassandra.yaml file. Cassandra does not handle 'quick' schema changes very well, and we found that only one node should be used for these changes. When adding or removing column families, we have a single, property defined C* node that is designated as the schema node. After making a schema change, we had to throw in an artificial delay to ensure that the schema change propagated through the cluster before making the next schema change. And of course, relying on a single node being up for schema changes is less than ideal, so handling fail over to a new node is important. The final, and hardest problem, is that C* can't really handle schema changes while a node is being bootstrapped (new nodes, replacing a dead node). If a column family is dropped, but the new node has not yet received that data from its replica, the node will fail to bootstrap when it finally begins to receive that data - there is no column family for the data to be written to, so that node will be stuck in the joining state, and it's system keyspace needs to be wiped and re-synced to attempt to get back to a happy state. This unfortunately means we have to stop schema changes when a node needs to be replaced, but we have this flow down pretty well. Hope this helps, Jeremy Powell On Mon, May 12, 2014 at 5:53 PM, Kevin Burton bur...@spinn3r.com wrote: We have a log only data structure… everything is appended and nothing is ever updated. We should be totally fine with having lots of SSTables sitting on disk because even if we did a major compaction the data would still look the same. By 'lots' I mean maybe 1000 max. Maybe 1GB each. However, I would like a way to delete older data. One way to solve this could be to just drop an entire SSTable if all the records inside have tombstones. Is this possible, to just drop a specific SSTable? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. smime.p7s Description: S/MIME cryptographic signature
Re: Mutation messages dropped
It means asynchronous write mutations were dropped, but if the writes are completing without TimedOutException, then at least ConsistencyLevel replicas were correctly written. The remaining replicas will eventually be fixed by hinted handoff, anti-entropy (repair) or read repair. More info: http://wiki.apache.org/cassandra/FAQ#dropped_messages Please note that 1 mutation != 1 record. For instance, if 1 row has N columns, than a record write for that row will have N mutations AFAIK (please correct me if I'm wrong). On Fri, May 9, 2014 at 8:52 AM, Raveendran, Varsha IN BLR STS varsha.raveend...@siemens.com wrote: Hello, I am writing around 10Million records continuously into a single node Cassandra (2.0.5) . In the Cassandra log file I see an entry “*272 MUTATION messages dropped in last 5000ms*” . Does this mean that 272 records were not written successfully? Thanks, Varsha -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200
Re: What % of cassandra developers are employed by Datastax?
perhaps the committers should invite other developers that have shown an interest in contributing to Cassandra. the rate of adding new non-Datastax committers appears to be low the last 2 years. I have no data to support it, it's just a feeling based personal observations the last 3 years.
Re: Query first 1 columns for each partitioning keys in CQL?
Hello, Have you looked at using the CLUSTERING ORDER BY and LIMIT features of CQL3? These may help you achieve your goals. http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html Jonathan Lacefield Solutions Architect, DataStax (404) 822 3487 http://www.linkedin.com/in/jlacefield http://www.datastax.com/cassandrasummit14 On Fri, May 16, 2014 at 12:23 AM, Matope Ono matope@gmail.com wrote: Hi, I'm modeling some queries in CQL3. I'd like to query first 1 columns for each partitioning keys in CQL3. For example: create table posts( author ascii, created_at timeuuid, entry text, primary key(author,created_at) ); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); And I want results like below. mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john I think that this is what SELECT FIRST statements did in CQL2. The only way I came across in CQL3 is retrieve whole records and drop manually, but it's obviously not efficient. Could you please tell me more straightforward way in CQL3?
Re: What % of cassandra developers are employed by Datastax?
Of the 16 active committers, 8 are not at DataStax. See http://wiki.apache.org/cassandra/Committers. That said, active involvement varies and there are other contributors inside DataStax and in the community. You can look at the dev mailing list as well to look for involvement in more detail. On 16 May 2014, at 10:28, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Don’t know, but as a potential customer of DataStax I’m also concerned at the fact that there does not seem to be a competitor offering Cassandra support and services. All innovation seems to be occurring only in the OSS version or DSE(*). I’d welcome a competitor for DSE - it does not even have to be so well-rounded ;-) (DSE is really cool, and I think DataStax is doing awesome work. I just get uncomfortable when there’s a SPoF - that’s why I’m running Cassandra in the first place ;-) ((So yes, you, exactly you who is reading this and thinking of starting a company around Cassandra, pitch me when you have a product.)) (((* Yes, Netflix is open sourcing a lot of Cassandra stuff, but I don’t think they’re planning to pivot.))) /Janne On 14 May 2014, at 23:39, Kevin Burton bur...@spinn3r.com wrote: I'm curious what % of cassandra developers are employed by Datastax? … vs other companies. When MySQL was acquired by Oracle this became a big issue because even though you can't really buy an Open Source project, you can acquire all the developers and essentially do the same thing. It would be sad if all of Cassandra's 'eggs' were in one basket and a similar situation happens with Datastax. Seems like they're doing an awesome job to be sure but I guess it worries me in the back of my mind. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
RE: NTS, vnodes and 0% chance of data loss
Why not use NetworkTopology and specify each region as a ‘DC’ ? Setup a snitch (propertyFile or Gossip, or even the EC2Region one) to list out which nodes are in which DC. Then when creating the Keyspace, specify NetworkTopology, with RF1 in each DC / Rack. Ie. CREATE KEYSPACE fred WITH replication = {'class': 'NetworkTopologyStrategy', 'DC2': '1', 'DC3': '1', 'DC1': '1'}; Regards Mark Farnan From: William Oberman [mailto:ober...@civicscience.com] Sent: Tuesday, May 13, 2014 11:11 PM To: user@cassandra.apache.org Subject: NTS, vnodes and 0% chance of data loss I found this: http://mail-archives.apache.org/mod_mbox/cassandra-user/201404.mbox/%3ccaeduwd1erq-1m-kfj6ubzsbeser8dwh+g-kgdpstnbgqsqc...@mail.gmail.com%3E I read the three referenced cases. In addition, case 4123 references: http://www.mail-archive.com/dev@cassandra.apache.org/msg03844.html And even though I *think* I understand all of the issues now, I still want to double check... Assumptions: -A cluster using NTS with options [DC:3] -Physical layout = In DC, 3 nodes/rack for a total of 9 nodes No vnodes: I could do token selection using ideas from case 3810 such that each rack has one replica. At this point, my 0% chance of data loss scenarios are: 1.) Failure of two nodes at random 2.) Failure of 2 racks (6 nodes!) Vnodes: my 0% chance of data loss scenarios are: 1.) Failure of two nodes at random Which means a rack failure (3 nodes) has a non-zero chance of data failure (right?). To get specific, I'm in AWS, so racks ~= availability zones. In the years I've been in AWS, I've seen several occasions of single zone downtimes, and one time of single zone catastrophic loss. E.g. for AWS I feel like you *have* to plan for a single zone failure, and in terms of safety first you *should* plan for two zone failures. To mitigate this data loss risk seems rough for vnodes, again if I'm understanding everything correctly: -To ensure 0% data loss for one zone = I need RF=4 -To ensure 0% data loss for two zones = I need RF=7 I'd really like to use vnodes, but RF=7 is crazy. To reiterate what I think is the core idea of this message: 1.) for vnodes 0% data loss = RF=(# of allowed failures at once)+1 2.) racks don't change the above equation at all will
ANN Cassaforte 1.3.0 is released
Cassaforte [1] is a Clojure client for Cassandra built around CQL and focusing on ease of use. Release notes: http://blog.clojurewerkz.org/blog/2014/05/15/cassaforte-1-dot-3-0-is-released/ 1. http://clojurecassandra.info -- MK http://github.com/michaelklishin http://twitter.com/michaelklishin
Re: What % of cassandra developers are employed by Datastax?
On 05/14/2014 03:39 PM, Kevin Burton wrote: I'm curious what % of cassandra developers are employed by Datastax? http://wiki.apache.org/cassandra/Committers -- Kind regards, Michael
Re: What % of cassandra developers are employed by Datastax?
Don’t know, but as a potential customer of DataStax I’m also concerned at the fact that there does not seem to be a competitor offering Cassandra support and services. All innovation seems to be occurring only in the OSS version or DSE(*). I’d welcome a competitor for DSE - it does not even have to be so well-rounded ;-) (DSE is really cool, and I think DataStax is doing awesome work. I just get uncomfortable when there’s a SPoF - that’s why I’m running Cassandra in the first place ;-) ((So yes, you, exactly you who is reading this and thinking of starting a company around Cassandra, pitch me when you have a product.)) (((* Yes, Netflix is open sourcing a lot of Cassandra stuff, but I don’t think they’re planning to pivot.))) /Janne On 14 May 2014, at 23:39, Kevin Burton bur...@spinn3r.com wrote: I'm curious what % of cassandra developers are employed by Datastax? … vs other companies. When MySQL was acquired by Oracle this became a big issue because even though you can't really buy an Open Source project, you can acquire all the developers and essentially do the same thing. It would be sad if all of Cassandra's 'eggs' were in one basket and a similar situation happens with Datastax. Seems like they're doing an awesome job to be sure but I guess it worries me in the back of my mind. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Mutation messages dropped
Shameless plug: http://www.evidencebasedit.com/guide-to-cassandra-thread-pools/#droppable On May 15, 2014, at 7:37 PM, Mark Reddy mark.re...@boxever.com wrote: Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for further details. Mark On Fri, May 9, 2014 at 12:52 PM, Raveendran, Varsha IN BLR STS varsha.raveend...@siemens.com wrote: Hello, I am writing around 10Million records continuously into a single node Cassandra (2.0.5) . In the Cassandra log file I see an entry “272 MUTATION messages dropped in last 5000ms” . Does this mean that 272 records were not written successfully? Thanks, Varsha
Migrate a model from 0.6
Hi all, more than a years ago I wrote a comment for migrating an old schema to a new model. Since the company had other priorities we didn't realize, and now I'm trying to upgrade my 0.6 data-model to the newest 2.0 model. The DB contains mainly comments written by users on companies. Comments must be validated (when they come into the application they are in pending status, and then they can be approved or rejected). The main queries with very intensive use (and that should perform very fast) are: 1) Get all approved comments of a company sorted by insertion time 2) Get all approved comments of a user sorted by insertion time 3) Get latest X approved comments in city with a vote higher than Y sorted by insertion time User/Company comments are less than 100 in 90% of situations: in general when dealing with user and company comments the amount of data is few kilobytes. Comments in a city can be a more than 200.000 and is a fast-growing number. In my old data model I had companies table, users table and comments table. The last containing the comments and 3 more column families (company_comments/user_comments/city_comments) containing only a set of time-sorted uuid pointers to comments table. I have no idea in how many tables I should keep data in new model. I've been reading lots of documentation: to make the model easier I though something like this ... users and companies table like in the old model. As far as comments: CREATE TABLE comments ( location text, id timeuuid, status text, companyid uuid, userid uuid, text text, title text, vote varint, PRIMARY KEY ((location, status, vote), id) ) WITH CLUSTERING ORDER BY (id DESC); create index companyid_key on commenti(companyid); create index userid_key on commenti(userid); This model should provide, out of the box, the query number 3. select * from comments where location='city' and status='approved' and vote in (3,4,5) order by id DESC limit X; But the other 2 queries are made with secondary index and client-side intensive. select * from comments where companyid='123'; select * from comments where userid='123'; And this will retrieve all company/user comments but they are 1 - not filtered by their status 2 - not sorted in any way Considering the amount of data told before how would you model the platform? Thanks for any help
Re: Cassandra MapReduce/Storm/ etc
Here’s a meetup talk on analytics using Cassandra, Storm, and Kafka: http://www.slideshare.net/aih1013/building-largescale-analytics-platform-with-storm-kafka-and-cassandra-nyc-storm-user-group-meetup-21st-nov-2013 -- Jack Krupansky From: Manoj Khangaonkar Sent: Thursday, May 8, 2014 5:43 PM To: user@cassandra.apache.org Subject: Cassandra MapReduce/Storm/ etc Hi, Searching for Cassandra with MapReduce, I am finding that the search results are really dated -- from version 0.7 2010/2011. Is there a good blog/article that describes how using MapReduce on Cassandra table ? From my naive understanding, Cassandra is all about partitioning. Querying is based on partitionkey + clustered column(s). Inputs to MapReduce is a sequence of Key,values. For Storm it is a stream of tuples. If a database table is input source for MapReduce or Storm, for me , this is in the simple case, is translating to a full table scan of the input table, which can timeout and is generally not a recommended access pattern in Cassandra. My initial reaction is that if I need to process data with MapReduce or Storm, reading it from Cassandra might not be the optimal way. Storing the output to Cassandra however does make sense. If anyone had links to blogs or personal experience in this area, I would appreciate if you can share it. regards
conditional delete consistency level/timeout
Earlier I reported the following bug against C* 2.0.5 https://issues.apache.org/jira/browse/CASSANDRA-7176 It seems to be fixed in C* 2.0.7, but we are still seeing similar suspicious timeouts. We have a cluster of C* 2.0.7, DC1:3, DC2:3 We have the following table: CREATE TABLE conditional_update_lock ( resource_id text, lock_id uuid, PRIMARY KEY (resource_id) ) We noticed that DELETE queries against this table sometimes timeout: A sample raw query executed through datastax java-driver 2.0.1 which timed out: DELETE from conditional_update_lock where resource_id = 'STUDY_4234234.324.470' IF lock_id = da2dd547-e807-45de-9d8c-787511123f3c; java-driver throws com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during rea query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded) We set LOCAL_SERIAL and LOCAL_QUORUM as serial consistency level and consistency level in the query option passed to datastax Cluster.Builder. In my understanding the above query should be executed in LOCAL_SERIAL consistency level, I wonder why the exception says it failed to run the query in the LOCAL_QUORUM consistency level? We are running a large number of queries against different tables in our cassandra cluster but only the above one times out often. I wonder if there is anything inefficient/buggy in the implementation of conditional delete in cassandra? Mohica
Re: Cassandra 2.0.7 always failes due to 'too may open files' error
Yes the global limits are OK. I added cassandra to '/etc/rc.local' to make it auto-startup, but seems the modification of limits didn't take effect. I observed this as Bryan suggested, so I added ulimit -SHn 99 to '/etc/rc.local' and before cassandra start command, and it worked. On Thu, May 8, 2014 at 3:34 AM, Nikolay Mihaylov n...@nmmm.nu wrote: sorry, probably somebody mentioned it, but did you checked global limit? cat /proc/sys/fs/file-max cat /proc/sys/fs/file-nr On Mon, May 5, 2014 at 10:31 PM, Bryan Talbot bryan.tal...@playnext.comwrote: Running # cat /proc/$(cat /var/run/cassandra.pid)/limits as root or your cassandra user will tell you what limits it's actually running with. On Sun, May 4, 2014 at 10:12 PM, Yatong Zhang bluefl...@gmail.comwrote: I am running 'repair' when the error occurred. And just a few days before I changed the compaction strategy to 'leveled'. don know if this helps On Mon, May 5, 2014 at 1:10 PM, Yatong Zhang bluefl...@gmail.comwrote: Cassandra is running as root [root@storage5 ~]# ps aux | grep java root 1893 42.0 24.0 7630664 3904000 ? Sl 10:43 60:01 java -ea -javaagent:/mydb/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms3959M -Xmx3959M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-pidfile=/var/run/cassandra.pid -cp /mydb/cassandra/bin/../conf:/mydb/cassandra/bin/../build/classes/main:/mydb/cassandra/bin/../build/classes/thrift:/mydb/cassandra/bin/../lib/antlr-3.2.jar:/mydb/cassandra/bin/../lib/apache-cassandra-2.0.7.jar:/mydb/cassandra/bin/../lib/apache-cassandra-clientutil-2.0.7.jar:/mydb/cassandra/bin/../lib/apache-cassandra-thrift-2.0.7.jar:/mydb/cassandra/bin/../lib/commons-cli-1.1.jar:/mydb/cassandra/bin/../lib/commons-codec-1.2.jar:/mydb/cassandra/bin/../lib/commons-lang3-3.1.jar:/mydb/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/mydb/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:/mydb/cassandra/bin/../lib/disruptor-3.0.1.jar:/mydb/cassandra/bin/../lib/guava-15.0.jar:/mydb/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/mydb/cassandra/bin/../lib/jackson-core-asl-1.9.2.jar:/mydb/cassandra/bin/../lib/jackson-mapper-asl-1.9.2.jar:/mydb/cassandra/bin/../lib/jamm-0.2.5.jar:/mydb/cassandra/bin/../lib/jbcrypt-0.3m.jar:/mydb/cassandra/bin/../lib/jline-1.0.jar:/mydb/cassandra/bin/../lib/json-simple-1.1.jar:/mydb/cassandra/bin/../lib/libthrift-0.9.1.jar:/mydb/cassandra/bin/../lib/log4j-1.2.16.jar:/mydb/cassandra/bin/../lib/lz4-1.2.0.jar:/mydb/cassandra/bin/../lib/metrics-core-2.2.0.jar:/mydb/cassandra/bin/../lib/netty-3.6.6.Final.jar:/mydb/cassandra/bin/../lib/reporter-config-2.1.0.jar:/mydb/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/mydb/cassandra/bin/../lib/slf4j-api-1.7.2.jar:/mydb/cassandra/bin/../lib/slf4j-log4j12-1.7.2.jar:/mydb/cassandra/bin/../lib/snakeyaml-1.11.jar:/mydb/cassandra/bin/../lib/snappy-java-1.0.5.jar:/mydb/cassandra/bin/../lib/snaptree-0.1.jar:/mydb/cassandra/bin/../lib/super-csv-2.1.0.jar:/mydb/cassandra/bin/../lib/thrift-server-0.3.3.jar org.apache.cassandra.service.CassandraDaemon On Mon, May 5, 2014 at 1:02 PM, Philip Persad philip.per...@gmail.comwrote: Have you tried running ulimit -a as the Cassandra user instead of as root? It is possible that your configured a high file limit for root but not for the user running the Cassandra process. On Sun, May 4, 2014 at 6:07 PM, Yatong Zhang bluefl...@gmail.comwrote: [root@storage5 ~]# lsof -n | grep java | wc -l 5103 [root@storage5 ~]# lsof | wc -l 6567 It's mentioned in previous mail:) On Mon, May 5, 2014 at 9:03 AM, nash nas...@gmail.com wrote: The lsof command or /proc can tell you how many open files it has. How many is it? --nash
Re: What does the rate signify for latency in the JMX Metrics?
Unfortunately, I found the documentation to be very lackluster. However, I have actually begun to use the Yammer Metrics library in other projects, so I have a much better understanding of what it generates. Thank you for the response! (also, for some strange reason, I am just getting the email now, on 5/16, even though it says you replied 5/7--weird) Andrew On Wed, May 7, 2014 at 11:00 AM, Chris Burroughs chris.burrou...@gmail.comwrote: They are exponential decaying moving averages (like Unix load averages) of the number of events per unit of time. http://wiki.apache.org/cassandra/Metrics might help On 04/17/2014 06:06 PM, Redmumba wrote: Good afternoon, I'm attempting to integrate the metrics generated via JMX into our internal framework; however, the information for several of the metrics includes a One/Five/Fifteen-minute rate, with the RateUnit in SECONDS. For example: $get -b org.apache.cassandra.metrics:name=Latency,scope=Write,type=ClientRequest * #mbean = org.apache.cassandra.metrics:name=Latency,scope=Write,type= ClientRequest: LatencyUnit = MICROSECONDS; EventType = calls; RateUnit = SECONDS; MeanRate = 383.6944837362387; FifteenMinuteRate = 868.8420188648543; FiveMinuteRate = 817.5239450236011; OneMinuteRate = 675.7673129014964; Max = 498867.0; Count = 31257426; Min = 52.0; 50thPercentile = 926.0; Mean = 1063.114029159023; StdDev = 1638.1542477604232; 75thPercentile = 1064.75; 95thPercentile = 1304.55; 98thPercentile = 1504.39992; 99thPercentile = 2307.35104; 999thPercentile = 10491.8502; What does the rate signify in this context? For example, given the OneMinuteRate of 675.7673129014964 and the unit of seconds--what is this measuring? Is this the rate of which metrics are submitted? i.e., there were an average of (676 * 60 seconds) metrics submitted over the last minute? Thanks!
Re: Tombstones
Nodetool cleanup deletes rows that aren't owned by specific tokens (shouldn't be on this node). And nodetool repair makes sure data is in sync between all replicas. It is wrong to say either of these commands cleanup tombstones. Tombstones are only cleaned up during compactions only if they are expired passed gc_grace_seconds. Now it is also incorrect to say that compaction always cleans up tombstones. In fact there are situations that can lead to tombstones live for a long time. SSTables are immutable, so if the SSTables that hold tombstones aren't part of a compaction the tombstones don't get cleaned up, so the behavior you are expecting is not 100% predictable. In case of LCS, if SStables are promoted to another level, compaction happens and tombstones which are expired will cleanup. Unlike SizeTiered in LCS there is no easy way to force compaction on SSTables. One hack I have tried in the past was to stop the node and deleted the .json file that holds level manifests. Start the node. LCS will compact all of them again to figure out the levels. Another way is if you pick smaller SSTable sizes, you may have more compaction churn but again it is not 100% guarantee that the tombstones you want will be cleaned up. On Fri, May 16, 2014 at 9:06 AM, Omar Shibli o...@eyeviewdigital.comwrote: Yes, but still you need to run 'nodetool cleanup' from time to time to make sure all tombstones are deleted. On Fri, May 16, 2014 at 10:11 AM, Dimetrio dimet...@flysoft.ru wrote: Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- Cheers, -Arya
Re: What % of cassandra developers are employed by Datastax?
Perhaps because the developers are working on DSE :-P On Fri, May 16, 2014 at 8:13 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: Of the 16 active committers, 8 are not at DataStax. See http://wiki.apache.org/cassandra/Committers. That said, active involvement varies and there are other contributors inside DataStax and in the community. You can look at the dev mailing list as well to look for involvement in more detail. On 16 May 2014, at 10:28, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Don’t know, but as a potential customer of DataStax I’m also concerned at the fact that there does not seem to be a competitor offering Cassandra support and services. All innovation seems to be occurring only in the OSS version or DSE(*). I’d welcome a competitor for DSE - it does not even have to be so well-rounded ;-) (DSE is really cool, and I think DataStax is doing awesome work. I just get uncomfortable when there’s a SPoF - that’s why I’m running Cassandra in the first place ;-) ((So yes, you, exactly you who is reading this and thinking of starting a company around Cassandra, pitch me when you have a product.)) (((* Yes, Netflix is open sourcing a lot of Cassandra stuff, but I don’t think they’re planning to pivot.))) /Janne On 14 May 2014, at 23:39, Kevin Burton bur...@spinn3r.com wrote: I'm curious what % of cassandra developers are employed by Datastax? … vs other companies. When MySQL was acquired by Oracle this became a big issue because even though you can't really buy an Open Source project, you can acquire all the developers and essentially do the same thing. It would be sad if all of Cassandra's 'eggs' were in one basket and a similar situation happens with Datastax. Seems like they're doing an awesome job to be sure but I guess it worries me in the back of my mind. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Tombstones
Yes, but still you need to run 'nodetool cleanup' from time to time to make sure all tombstones are deleted. On Fri, May 16, 2014 at 10:11 AM, Dimetrio dimet...@flysoft.ru wrote: Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Can Cassandra client programs use hostnames instead of IPs?
Thanks. My case is that there is no public ip and VPN cannot be set up. It seems that I have to run EMR job to operate on the AWS cassandra cluster. I got some timeout errors during running the EMR job as: java.lang.RuntimeException: Could not retrieve endpoint ranges: at org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:333) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149) at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144) at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:228) at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:213) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:658) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out at org.apache.thrift.transport.TSocket.open(TSocket.java:183) at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) at org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.createThriftClient(BulkRecordWriter.java:348) at org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:293) ... 12 more Caused by: java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.thrift.transport.TSocket.open(TSocket.java:178) ... 15 more Appreciated if some suggestions are provided. On Tue, May 13, 2014 at 7:45 AM, Ben Bromhead b...@instaclustr.com wrote: You can set listen_address in cassandra.yaml to a hostname ( http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html ). Cassandra will use the IP address returned by a DNS query for that hostname. On AWS you don't have to assign an elastic IP, all instances will come with a public IP that lasts its lifetime (if you use ec2-classic or your VPC is set up to assign them). Note that whatever hostname you set in a nodes listen_address, it will need to return the private IP as AWS instances only have network access via there private address. Traffic to a instances public IP is NATed and forwarded to the private address. So you may as well just use the nodes IP address. If you run hadoop on instances in the same AWS region it will be able to access your Cassandra cluster via private IP. If you run hadoop externally just use the public IPs. If you run in a VPC without public addressing and want to connect from external hosts you will want to look at a VPN ( http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html). Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustrhttp://twitter.com/instaclustr | +61 415 936 359 On 13/05/2014, at 4:31 AM, Huiliang Zhang zhl...@gmail.com wrote: Hi, Cassandra returns ips of the nodes in the cassandra cluster for further communication between hadoop program and the casandra cluster. Is there a way to configure the cassandra cluster to return hostnames instead of ips? My cassandra cluster is on AWS and has no elastic ips which can be accessed outside AWS. Thanks, Huiliang
Re: Really need some advices on large data considerations
Hi Michael, thanks for the reply, I would RAID0 all those data drives, personally, and give up managing them separately. They are on multiple PCIe controllers, one drive per channel, right? Raid 0 is a simple way to go but one disk failure can cause the whole volume down, so I am afraid raid 0 won't be our choice. I would highly suggest re-thinking about how you want to set up your data model and re-plan your cluster appropriately, Our data is large but our model is simple and most of the operation is reading by key, and we never update the data (only delete periodically). Due to its 'dynamo' arch serving so much 'static' data on cassandra is not a problem. What I am concerning is the 'dynamic' part, compactions, adding / removing nodes, data re-blancing or some thing like that. One thing we most care is scalability and fail-over strategy and looks like Cassandra is splendid for this: linear scalability, decentralized, auto-partition, auto-recovery. So we choose it. but if you are using large blobs like image data, think about putting that blob data somewhere else Any good ideas about this? The doc you mentioned on the datastax site is great. we're still gathering information and evaluating cassandra, and it'll be great if you have any other suggestions! Thanks Best
Data modeling for Pinterest-like application
Hello, I'm working on data modeling for a Pinterest-like project. There are basically two main concepts: Pin and Board, just like Pinterest, where pin is an item containing an image, description and some other information such as a like count, and each board should contain a sorted list of Pins. The board can be modeled with primary key (board_id, created_at, pin_id) where created_at is used to sort the pins of the board by date. The problem is whether I should denormalize details of pins into the board table or just retrieve pins by page (page size can be 10~20) and then multi-get by pin_ids to obtain details. Since there are some boards that are accessed very often (like the home board), denormalization seems to be a reasonable choice to enhance read performance. However, we then have to update not only the pin table be also each row in the board table that contains the pin whenever a pin is updated, which sometimes could be quite frequent (such as updating the like count). Since a pin may be contained by many boards (could be thousands), denormalization seems to bring a lot of load on the write side as well as application code complexity. Any suggestion to whether our data model should go denormalized or the normalized/multi-get way which then perhaps need a separate cached layer for read? Thanks, Ziju
Re: Storing log structured data in Cassandra without compactions for performance boost.
If you make the timestamp the partition key you won't be able to do range queries (unless you use an ordered partitioner). Assuming you are logging from multiple devices you will want your partition key to be the device id the date, your clustering key to be the timestamp (timeuuid are good to prevent collisions) and then log message, levels etc as the other columns. Then you can also create a new table for every week (or day/month depending on how much granularity you want) and just write to the current weeks table. This step allows you to delete old data without Cassandra using tombstones (you just drop the table for the week of logs you want to delete). For a much clearer explantation see http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries (the last few slides). As for compaction, I would leave it enabled as having lots of stables hanging around can make range queries slower (the query has more files to visit). See http://stackoverflow.com/questions/8917882/cassandra-sstables-and-compaction (a little old but still relevant). Compaction also fixes up things like merging row fragments (when you write new columns to the same row). Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 07/05/2014, at 10:55 AM, Kevin Burton bur...@spinn3r.com wrote: I'm looking at storing log data in Cassandra… Every record is a unique timestamp for the key, and then the log line for the value. I think it would be best to just disable compactions. - there will never be any deletes. - all the data will be accessed in time range (probably partitioned randomly) and sequentially. So every time a memtable flushes, we will just keep that SSTable forever. Compacting the data is kind of redundant in this situation. I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered. Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone. Is this possible? -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: How long are expired values actually returned?
Thank you for your answer, I really appreciate that you want to help me. But already found out that I did something wrong in my implementation. Am 13.05.2014 02:53, schrieb Chris Lohfink: That is not expected. What client are you using and how are you setting the ttls? What version of Cassandra? --- Chris Lohfink On May 8, 2014, at 9:44 AM, Sebastian Schmidt isib...@gmail.com wrote: Hi, I'm using the TTL feature for my application. In my tests, when using a TTL of 5, the inserted rows are still returned after 7 seconds, and after 70 seconds. Is this normal or am I doing something wrong?. Kind Regards, Sebastian signature.asc Description: OpenPGP digital signature
Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)
Hello Anton, What version of Cassandra are you using? If between 1.2.6 and 2.0.6 the setInputRange(startToken, endToken) is not working. This was fixed in 2.0.7: https://issues.apache.org/jira/browse/CASSANDRA-6436 If you can't upgrade you can copy AbstractCFIF and CFIF to your project and apply the patch there. Cheers, Paulo On Wed, May 14, 2014 at 10:29 PM, Anton Brazhnyk anton.brazh...@genesys.com wrote: Greetings, I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd like to read just part of it - something like Spark's sample() function. Cassandra's API seems allow to do it with its ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, but it doesn't work. The limit is just ignored and the entire column family is scanned. It seems this kind of feature is just not supported and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO). Questions: 1. Am I right that there is no way to get some data limited by token range with ColumnFamilyInputFormat? 2. Is there other way to limit the amount of data read from Cassandra with Spark and ColumnFamilyInputFormat, so that this amount is predictable (like 5% of entire dataset)? WBR, Anton -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200
Tombstones on secondary indexes
My system log is full of messages like this one: WARN [ReadStage:42] 2014-05-15 08:19:13,615 SliceQueryFilter.java (line 210) Read 0 live and 2829 tombstoned cells in TrafficServer.rawData.rawData_evaluated_idx (see tombstone_warn_threshold) I've run a major compaction but the tombstones are not removed. https://issues.apache.org/jira/browse/CASSANDRA-4314 seems to say that tombstones on secondary indexes are not removed by a compaction. Do I need to do it manually? Best regards, Joel Samuelsson
Storing globally sorted data
Let's say I have an external job (MR, pig, etc) sorting a cassandra table by some complicated mechanism. We want to store the sorted records BACK into cassandra so that clients can read the records sorted. What I was just thinking of doing was storing the records as pages. So page 0 would have records 0-999…. We would just have the key be the page ID and then the values be the primary keys for the records so that they can be fetched. I could also denormalize the data and store them inline as a materialized view but of course this would require much more disk space. Thoughts on this strategy? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Really need some advices on large data considerations
You can watch this: https://www.youtube.com/watch?v=uoggWahmWYI Aaron is discussing about support for big nodes On Wed, May 14, 2014 at 3:13 AM, Yatong Zhang bluefl...@gmail.com wrote: Thank you Aaron, but we're planning about 20T per node, is that feasible? On Mon, May 12, 2014 at 4:33 PM, Aaron Morton aa...@thelastpickle.comwrote: We've learned that compaction strategy would be an important point cause we've ran into 'no space' trouble because of the 'sized tiered' compaction strategy. If you want to get the most out of the raw disk space LCS is the way to go, remember it uses approximately twice the disk IO. From our experience changing any settings/schema during a large cluster is on line and has been running for some time is really really a pain. Which parts in particular ? Updating the schema or config ? OpsCentre has a rolling restart feature which can be handy when chef / puppet is deploying the config changes. Schema / gossip can take a little to propagate with high number of nodes. On a modern version you should be able to run 2 to 3 TB per node, maybe higher. The biggest concerns are going to be repair (the changes in 2.1 will help) and bootstrapping. I’d recommend testing a smaller cluster, say 12 nodes, with a high load per node 3TB. cheers Aaron - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 9/05/2014, at 12:09 pm, Yatong Zhang bluefl...@gmail.com wrote: Hi, We're going to deploy a large Cassandra cluster in PB level. Our scenario would be: 1. Lots of writes, about 150 writes/second at average, and about 300K size per write. 2. Relatively very small reads 3. Our data will be never updated 4. But we will delete old data periodically to free space for new data We've learned that compaction strategy would be an important point cause we've ran into 'no space' trouble because of the 'sized tiered' compaction strategy. We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderationsand is this enough or update-to-date? From our experience changing any settings/schema during a large cluster is on line and has been running for some time is really really a pain. So we're gathering more info and expecting some more practical suggestions before we set up the cassandra cluster. Thanks and any help is of great appreciation
Re: How does cassandra page through low cardinality indexes?
Hello Kevin For the internal working of secondary index and LIMIT, you can have a look at this : https://issues.apache.org/jira/browse/CASSANDRA-5975 The comments and attached patch will give you a hint on how LIMIT is implemented. Alternatively you can look directly in the source code starting from the modified class given in the patch On Fri, May 16, 2014 at 7:53 PM, Kevin Burton bur...@spinn3r.com wrote: I'm struggling with cassandra secondary indexes since the documentation seems all over the place and I'm having to put together everything from blog posts. Anyway. If I have a low cardinality index of say 10 values, and 1M records. This means each secondary index key will have references to 100,000 rows. How does Cassandra page through the rows when using LIMIT and paging by the reference? Are the row references sorted in the index? Thanks! -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Best partition type for Cassandra with JBOD
That and nobarrier… and probably noop for the scheduler if using SSD and setting readahead to zero... On Fri, May 16, 2014 at 10:29 AM, James Campbell ja...@breachintelligence.com wrote: Hi all— What partition type is best/most commonly used for a multi-disk JBOD setup running Cassandra on CentOS 64bit? The datastax production server guidelines recommend XFS for data partitions, saying, “Because Cassandra can use almost half your disk space for a single file, use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited on 64-bit.” However, the same document also notes that “Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node,” which makes me think 16TB file sizes would be irrelevant (especially when not using RAID to create a single large volume). What has been the experience of this group? I also noted that the guidelines don’t mention setting noatime and nodiratime flags in the fstab for data volumes, but I wonder if that’s a common practice. James -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
ownership not equally distributed
Hello I am having a 4 node cluster where 2 nodes are in one data center and another 2 in a different one. But in the first data center the token ownership is not equally distributed. I am using vnode feature. num_tokens is set to 256 in all nodes. initial_number is left blank. Datacenter: DC1 Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.145.84.167 84.58 MB 256* 0.4% * ce5ddceb-b1d4-47ac-8d85-249aa7c5e971 RAC1 UN 10.145.84.166 692.69 MB 255 44.2% e6b5a0fd-20b7-4bf9-9a8e-715cfc823be6 RAC1 Datacenter: DC2 Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.168.67.43 476 MB 256 27.8% 05dc7ea6-0328-43b8-8b70-bcea856ba41e RAC1 UN 10.168.67.42 413.15 MB 256 27.7% 677025f0-780c-45dc-bb3b-17ad260fba7d RAC1 done nodetool repair couple of times, but it didn't help. In the node where less ownership there, I have seen a frequent full GC occurring couple of times and had to restart cassandra. Any suggestions on how to resolve this is highly appreciated. Regards, Rameez
Re: Best partition type for Cassandra with JBOD
Hi, Recommending nobarrier (mount option barrier=0) when you don't know if a non-volatile cache in play is probably not the way to go. A non-volatile cache will typically ignore write barriers if a given block device is configured to cache writes anyways. I am also skeptical you will see a boost in performance. Applications that want to defer and batch writes won't emit write barriers frequently and when they do it's because the data has to be there. Filesystems depend on write barriers although it is surprisingly hard to get a reordering that is really bad because of the way journals are managed. Cassandra uses log structured storage and supports asynchronous periodic group commit so it doesn't need to emit write barriers frequently. Setting read ahead to zero on an SSD is necessary to get the maximum number of random reads, but will also disable prefetching for sequential reads. You need a lot less prefetching with an SSD due to the much faster response time, but it's still many microseconds. Someone with more Cassandra specific knowledge can probably give better advice as to when a non-zero read ahead make sense with Cassandra. This is something may be workload specific as well. Regards, Ariel On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote: That and nobarrier… and probably noop for the scheduler if using SSD and setting readahead to zero... On Fri, May 16, 2014 at 10:29 AM, James Campbell [1]ja...@breachintelligence.com wrote: Hi all— What partition type is best/most commonly used for a multi-disk JBOD setup running Cassandra on CentOS 64bit? The datastax production server guidelines recommend XFS for data partitions, saying, “Because Cassandra can use almost half your disk space for a single file, use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited on 64-bit.” However, the same document also notes that “Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node,” which makes me think 16TB file sizes would be irrelevant (especially when not using RAID to create a single large volume). What has been the experience of this group? I also noted that the guidelines don’t mention setting noatime and nodiratime flags in the fstab for data volumes, but I wonder if that’s a common practice. James -- Founder/CEO [2]Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: [3]http://burtonator.wordpress.com … or check out my [4]Google+ profile [5][spinn3r.jpg] War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. References 1. mailto:ja...@breachintelligence.com 2. http://Spinn3r.com/ 3. http://burtonator.wordpress.com/ 4. https://plus.google.com/102718274791889610666/posts 5. http://spinn3r.com/
Best partition type for Cassandra with JBOD
Hi all- What partition type is best/most commonly used for a multi-disk JBOD setup running Cassandra on CentOS 64bit? The datastax production server guidelines recommend XFS for data partitions, saying, Because Cassandra can use almost half your disk space for a single file, use XFS when using large disks, particularly if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and essentially unlimited on 64-bit. However, the same document also notes that Maximum recommended capacity for Cassandra 1.2 and later is 3 to 5TB per node, which makes me think 16TB file sizes would be irrelevant (especially when not using RAID to create a single large volume). What has been the experience of this group? I also noted that the guidelines don't mention setting noatime and nodiratime flags in the fstab for data volumes, but I wonder if that's a common practice. James
Questions on Leveled Compaction sizing and compaction corner cases
I was reading this http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra and need some confirmation: A Sizing *Each level is ten times as large as the previous* In the comments: At October 14, 2011 at 12:33 amhttp://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-18817http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-18817Jonathan said: *L1 gets 50MB (~10 sstables of data), L2 gets 500MB/100 sstables, L3 gets 5GB* At January 22, 2013 at 7:51 pmhttp://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-196897he said *Remember that within a level, data is guaranteed not to overlap across sstables*, or put another way, *a given row will be in at most one sstable* At February 11, 2013 at 7:32 amhttp://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-196901he said: A compaction will run whenever there is more data in Level N 0 than desired *(sstable_size_in_mb * 10**level)* At February 22, 2013 at 8:20 amhttp://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra#comment-196904he said: Leveled compaction restricts sstable size to 5MB or a single row, *whichever is larger* If I put all the info together: 1) sizeOf(Ln+1) = 10 * sizeOf(Ln) = sstable_size_in_mb * 10 ^ (n+1) 2) the size of a sstable is limited to *sstable_size_in_mb *by default* or more if a partition is large enouth to exceed sstable_size_in_mb * 3) because of point 2), the equality ssTableCount(Ln+1) = ssTableCount(Ln) does not always hold Is it correct so far ? B Compaction corner cases Now, one of the biggest selling point of LCS is its frequent compaction and that 90% of the read only touch 1 SSTable. Fine. Let's suppose we have data in 4 levels (taking the new default 160M for *sstable_size_in_mb) * : L0, L1 (1.6Gb), L2 (16Gb), L3 (160Gb partially filled). For some reason there was a burst in write in the application so data gets compacted up to L3. Now that the write/update workload is back to normal, compaction never goes beyond L2. In this case, all my old/deleted/obsolete data in L3 will never be compacted isn't it ? Or only at the next burst in write right ? Regards Duy Hai DOAN
Re: Storing globally sorted data
What you show is basically the idea of bucketing data. One bucket = one physical partition. Within each bucket, there is a fixed number of column (1000 in your example). This strategy works fine and avoid too large partition. The only draw back I would see is the need to fetch data over buckets but it seems that in your case you fetch data by partition so it should be ok. About denormalizing, it's the way to go. Disk space is sometimes cheaper that the high read latency caused by normalized data model. On Fri, May 16, 2014 at 8:41 PM, Kevin Burton bur...@spinn3r.com wrote: Let's say I have an external job (MR, pig, etc) sorting a cassandra table by some complicated mechanism. We want to store the sorted records BACK into cassandra so that clients can read the records sorted. What I was just thinking of doing was storing the records as pages. So page 0 would have records 0-999…. We would just have the key be the page ID and then the values be the primary keys for the records so that they can be fetched. I could also denormalize the data and store them inline as a materialized view but of course this would require much more disk space. Thoughts on this strategy? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Data modeling for Pinterest-like application
The problem is whether I should denormalize details of pins into the board table or just retrieve pins by page (page size can be 10~20) and then multi-get by pin_ids to obtain details -- Denormalize is the best way to go in your case. Otherwise, for 1 board read, you'll have 10-20 subsequent reads to load the pins. Multiply it by the number of users listing boards and you'll be quickly in trouble... For the update of pins like count, you'll need to use counter type. denormalization seems to bring a lot of load on the write side as well as application code complexity -- first C* copes quite well with write load. Second, you should ask yourself: how often is the update scenario vs the read scenario ? Usually the read pattern is predominant. About update code complexity it's the price to pay for read performance. CQRS pattern will help you to separate the write and read stages, also heavy unit and integration testing. On Fri, May 16, 2014 at 5:14 AM, ziju feng pkdog...@gmail.com wrote: Hello, I'm working on data modeling for a Pinterest-like project. There are basically two main concepts: Pin and Board, just like Pinterest, where pin is an item containing an image, description and some other information such as a like count, and each board should contain a sorted list of Pins. The board can be modeled with primary key (board_id, created_at, pin_id) where created_at is used to sort the pins of the board by date. The problem is whether I should denormalize details of pins into the board table or just retrieve pins by page (page size can be 10~20) and then multi-get by pin_ids to obtain details. Since there are some boards that are accessed very often (like the home board), denormalization seems to be a reasonable choice to enhance read performance. However, we then have to update not only the pin table be also each row in the board table that contains the pin whenever a pin is updated, which sometimes could be quite frequent (such as updating the like count). Since a pin may be contained by many boards (could be thousands), denormalization seems to bring a lot of load on the write side as well as application code complexity. Any suggestion to whether our data model should go denormalized or the normalized/multi-get way which then perhaps need a separate cached layer for read? Thanks, Ziju
Re: Backup procedure
It's also good to note that only the Data files are compressed already. Depending on your data the Index and other files may be a significant percent of total on disk data. On 05/02/2014 01:14 PM, tommaso barbugli wrote: In my tests compressing with lzop sstables (with cassandra compression turned on) resulted in approx. 50% smaller files. Thats probably because the chunks of data compressed by lzop are way bigger than the average size of writes performed on Cassandra (not sure how data is compressed but I guess it is done per single cell so unless one stores) 2014-05-02 19:01 GMT+02:00 Robert Coli rc...@eventbrite.com: On Fri, May 2, 2014 at 2:07 AM, tommaso barbugli tbarbu...@gmail.comwrote: If you are thinking about using Amazon S3 storage I wrote a tool that performs snapshots and backups on multiple nodes. Backups are stored compressed on S3. https://github.com/tbarbugli/cassandra_snapshotter https://github.com/JeremyGrosser/tablesnap SSTables in Cassandra are compressed by default, if you are re-compressing them you may just be wasting CPU.. :) =Rob
Number of rows under one partition key
Hi, I know this has been discussed before, and I know there are limitations to how many rows one partition key in practice can handle. But I am not sure if number of rows or total data is the deciding factor. I know the thrift interface well, but this is my first project where we are actively using cql, so this is also new for me. The case is like this:We have a partition key, clientid, which have a cluster key (id).Number of rows with the same clientid normally is between 10 000 and 100 000 I would guess. The data is pretty small, let's say 200 bytes per row in average (probably even smaller, but for the example let's assume 200 bytes).It _CAN_ however be more rows with the same clientid in some edge cases, I would guess up to 1 000 000. Most of the time we read with both id and clientid, or we read for example 1000 rows with just clientid. It would be nice to be able to fetch all rows in one query, if possible.Number of reads versus number of writes is about 100 to 1. Of that I guess that updates versus inserts is about 1:4. Deletes are rare.Currently the production environment is Cassandra 1.2.11, but we are testing this on Cassandra 2.0.something in our development environment. Questions:Should we add another partition key to avoid 1 000 000 rows in the same thrift-row (which is how I understand it is actually stored)? Or is 1 000 000 rows okay? If we add a bucketid-ish thing to the partition key, how should we do queries most effectively?Since reading is the most important, and writing and space is not an issue, should we have a high number of replications and read from (relatively) few nodes? When it comes to consistency, it isn't a problem waiting for everything to be replicated to responsive nodes (within some ms or even seconds), but if a node goes down and contains very old data (multiple minutes, hours or days) - that would be a problem, atleast if it happened regulary.. What, in practice, is the cost of reading with a high number of nodes in the consistency level. Does replicate to 4 nodes, read from 2 sound like an ok option here (avoiding full consistency, but at the same time if one node crashes and comes up with old data we still would get a pretty consistent result. The probability of 2 of the nodes crashing at the same time is low, and _maybe_ something we can live with in this specific case)? Other considerations, for example compaction strategy and if we should do an upgrade to 2.0 because of this (we will upgrade anyway, but if it is recommended we will continue to use 2.0 in development and upgrade the production environment sooner) I have done some testing, inserting a million rows and selecting them all, counting them and selecting individual rows (with both clientid and id) and it seems fine, but I want to ask to be sure that I am on the right track. Best regards,Vegard Berget
Re: Couter column family performance problems
On Mon, May 12, 2014 at 3:03 PM, Batranut Bogdan batra...@yahoo.com wrote: I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c counter, d counter Feel free to comment and share experiences about counter CF performance. Briefly : 1) Counters original version are slow and somewhat fragile. 2) Counters have been upgraded in 2.1 to be decent speed and less fragile. 3) Probably neither should be used in extreme volume cases (rate of update) or when high counter accuracy is required. https://issues.apache.org/jira/browse/CASSANDRA-6504 =Rob
Re: Tombstones
Note that Cassandra will not compact away some tombstones if you have differing column TTLs. See the following jira and resolution I filed for this: https://issues.apache.org/jira/browse/CASSANDRA-6654 On May 16, 2014 4:49 PM, Chris Lohfink clohf...@blackbirdit.com wrote: It will delete them after gc_grace_seconds (set per table) and a compaction. --- Chris Lohfink On May 16, 2014, at 9:11 AM, Dimetrio dimet...@flysoft.ru wrote: Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Failed to mkdirs $HOME/.cassandra
For now you can edit the nodetool script itself by adding -Duser.home=/tmp as in $JAVA $JAVA_AGENT -cp $CLASSPATH -Xmx32m -Duser.home=/tmp -Dlogback.configurationFile=logback-tools.xml -Dstorage-config=$CASSANDRA_CONF org.apache.cassandra.tools.NodeTool -p $JMX_PORT $ARGS if you like you can add an issue to jira. On 2014-05-09 18:42, Bryan Talbot wrote: How should nodetool command be run as the user nobody? The nodetool command fails with an exception if it cannot create a .cassandra directory in the current user's home directory. I'd like to schedule some nodetool commands to run with least privilege as cron jobs. I'd like to run them as the nobody user -- which typically has / as the home directory -- since that's what the user is typically used for (minimum privileges). None of the methods described in this JIRA actually seem to work (with 2.0.7 anyway) https://issues.apache.org/jira/browse/CASSANDRA-6475 [1] Testing as a normal user with no write permissions to the home directory (to simulate the nobody user) [vagrant@local-dev ~]$ nodetool version ReleaseVersion: 2.0.7 [vagrant@local-dev ~]$ rm -rf .cassandra/ [vagrant@local-dev ~]$ chmod a-w . [vagrant@local-dev ~]$ nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ HOME=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ env HOME=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ env user.home=/tmp nodetool flush my_ks my_cf Exception in thread main FSWriteError in /home/vagrant/.cassandra at org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:305) at org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:690) at org.apache.cassandra.tools.NodeCmd.printHistory(NodeCmd.java:1504) at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:1204) Caused by: java.io.IOException: Failed to mkdirs /home/vagrant/.cassandra ... 4 more [vagrant@local-dev ~]$ nodetool -Duser.home=/tmp flush my_ks my_cf Unrecognized option: -Duser.home=/tmp usage: java org.apache.cassandra.tools.NodeCmd --host arg command ... Links: -- [1] https://issues.apache.org/jira/browse/CASSANDRA-6475
Re: What % of cassandra developers are employed by Datastax?
You can always check the project committer wiki: http://wiki.apache.org/cassandra/Committers -- Jack Krupansky From: Kevin Burton Sent: Wednesday, May 14, 2014 4:39 PM To: user@cassandra.apache.org Subject: What % of cassandra developers are employed by Datastax? I'm curious what % of cassandra developers are employed by Datastax? … vs other companies. When MySQL was acquired by Oracle this became a big issue because even though you can't really buy an Open Source project, you can acquire all the developers and essentially do the same thing. It would be sad if all of Cassandra's 'eggs' were in one basket and a similar situation happens with Datastax. Seems like they're doing an awesome job to be sure but I guess it worries me in the back of my mind. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Query first 1 columns for each partitioning keys in CQL?
Hi, I'm modeling some queries in CQL3. I'd like to query first 1 columns for each partitioning keys in CQL3. For example: create table posts( author ascii, created_at timeuuid, entry text, primary key(author,created_at) ); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by john'); insert into posts(author,created_at,entry) values ('john',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by john'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-02-02 10:00+'),'This is an old entry by mike'); insert into posts(author,created_at,entry) values ('mike',minTimeuuid('2013-03-03 10:00+'),'This is a new entry by mike'); And I want results like below. mike,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by mike john,1c4d9000-83e9-11e2-8080-808080808080,This is a new entry by john I think that this is what SELECT FIRST statements did in CQL2. The only way I came across in CQL3 is retrieve whole records and drop manually, but it's obviously not efficient. Could you please tell me more straightforward way in CQL3?
Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)
Hi Anton, One approach you could look at is to write a custom InputFormat that allows you to limit the token range of rows that you fetch (if the AbstractColumnFamilyInputFormat does not do what you want). Doing so is not too much work. If you look at the class RowIterator within CqlRecordReader, you can see code in the constructor that creates a query with a certain token range: ResultSet rs = session.execute(cqlQuery, type.compose(type.fromString(split.getStartToken())), type.compose(type.fromString(split.getEndToken())) ); I think you can make a new version of the InputFormat and just tweak this method to achieve what you want. Alternatively, if you just want to get a sample of the data, you might want to change the InputFormat itself such that it chooses to query only a subset of the total input splits (or CfSplits). That might be easier. Best regards, Clint On Wed, May 14, 2014 at 6:29 PM, Anton Brazhnyk anton.brazh...@genesys.com wrote: Greetings, I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd like to read just part of it - something like Spark's sample() function. Cassandra's API seems allow to do it with its ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, but it doesn't work. The limit is just ignored and the entire column family is scanned. It seems this kind of feature is just not supported and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO). Questions: 1. Am I right that there is no way to get some data limited by token range with ColumnFamilyInputFormat? 2. Is there other way to limit the amount of data read from Cassandra with Spark and ColumnFamilyInputFormat, so that this amount is predictable (like 5% of entire dataset)? WBR, Anton
How does cassandra page through low cardinality indexes?
I'm struggling with cassandra secondary indexes since the documentation seems all over the place and I'm having to put together everything from blog posts. Anyway. If I have a low cardinality index of say 10 values, and 1M records. This means each secondary index key will have references to 100,000 rows. How does Cassandra page through the rows when using LIMIT and paging by the reference? Are the row references sorted in the index? Thanks! -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Tombstones
It will delete them after gc_grace_seconds (set per table) and a compaction. --- Chris Lohfink On May 16, 2014, at 9:11 AM, Dimetrio dimet...@flysoft.ru wrote: Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: What % of cassandra developers are employed by Datastax?
There does seem to be some effort trying to encourage others - DataStax had some talks explaining how to contribute. This year there is even a extra bootcamp http://learn.datastax.com/CassandraSummitBootcampApplication.html On May 16, 2014, at 9:47 AM, Peter Lin wool...@gmail.com wrote: perhaps the committers should invite other developers that have shown an interest in contributing to Cassandra. the rate of adding new non-Datastax committers appears to be low the last 2 years. I have no data to support it, it's just a feeling based personal observations the last 3 years.
Re: Storing log structured data in Cassandra without compactions for performance boost.
If the data is read from a slice of a partition that has been added over time there will be a part of that row in every almost sstable. That would mean all of them (multiple disk seeks depending on clustering order per sstable) would have to be read from in order to service the query. Data model can help or hurt a lot though. Yes… totally agree, but we wouldn't do that. The entire 'row' is immutable and passes through the system and then expires due to TTL. TTL is probably the way to go here, especially if Cassandra just drops the whole SSTable on the TTL expiration which is what I think Im hearing. If you set the TTL for the columns you added then C* will clean up sstables (if size tiered and post 1.2) once the datas been expired. Since you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt result in tombstones. Thanks! Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: What % of cassandra developers are employed by Datastax?
I used cassandra for years at NYSE and we were able to do what we wanted with cassandra by leveraging open source and internal development knowing that cassandra did what we wanted it to do and that no one could ever take the code away from us in a worst case scenario. Compare and contrast that with the pure proprietary model, and I'm sure it will help you sleep easier. -- Colin Clark +1-320-221-9531 On May 15, 2014, at 10:52 AM, Jack Krupansky j...@basetechnology.com wrote: You can always check the project committer wiki: http://wiki.apache.org/cassandra/Committers -- Jack Krupansky From: Kevin Burton Sent: Wednesday, May 14, 2014 4:39 PM To: user@cassandra.apache.org Subject: What % of cassandra developers are employed by Datastax? I'm curious what % of cassandra developers are employed by Datastax? … vs other companies. When MySQL was acquired by Oracle this became a big issue because even though you can't really buy an Open Source project, you can acquire all the developers and essentially do the same thing. It would be sad if all of Cassandra's 'eggs' were in one basket and a similar situation happens with Datastax. Seems like they're doing an awesome job to be sure but I guess it worries me in the back of my mind. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: Data modeling for Pinterest-like application
Thanks for your answer, I really like the frequency of update vs read way of thinking. A related question is whether it is a good idea to denormalize on read-heavy part of data while normalize on other less frequently-accessed data? Our app will have a limited number of system managed boards that are viewed by every user so it makes sense to denormalize and propagate updates of pins to these boards. We will also have a like board for each user containing pins that they like, which can be somewhat private and only viewed by the owner. Since a pin can be potentially liked by thousands of user, if we also denormalize the like board, everytime that pin is liked by another user we would have to update the like count in thousands of like boards. Does normalize work better in this case or cassandra can handle this kind of write load? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594517.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Multi-dc cassandra keyspace
It's often an excellent strategy. No known issues. -Tupshin On May 16, 2014 4:13 PM, Anand Somani meatfor...@gmail.com wrote: Hi, It seems like it should be possible to have a keyspace replicated only to a subset of DC's on a given cluster spanning across multiple DCs? Is there anything bad about this approach? Scenario Cluster spanning 4 DC's = CA, TX, NY, UT Has multiple keyspaces such that * keyspace_CA_TX - replication_strategy = {CA = 3, TX = 3} * keyspace_UT_NY - replication_strategy = {UT = 3, NY = 3} * keyspace_CA_UT - replication_strategy = {UT = 3, CA = 3} I am going to try this out, but was curious if anybody out there has tried it. Thanks Anand
Re: Mutation messages dropped
Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for further details. Mark On Fri, May 9, 2014 at 12:52 PM, Raveendran, Varsha IN BLR STS varsha.raveend...@siemens.com wrote: Hello, I am writing around 10Million records continuously into a single node Cassandra (2.0.5) . In the Cassandra log file I see an entry “*272 MUTATION messages dropped in last 5000ms*” . Does this mean that 272 records were not written successfully? Thanks, Varsha
Index with same Name but different keyspace
Hi, I am using Cassandra 2.0.5 version. I trying to setup 2 keyspace with same tables for different testing. While creating index on the tables, I realized I am not able to use the same index name though the tables are in different keyspaces. Is maintaining unique index name across keyspace is must/feature? -- Regards, Mahesh Rajamani
Re: Couter column family performance problems
What version are you using? and what consistency level are you using for your inserts? A CL.ONE for instance can end up with a large backup in the replicateOnWrite (or CounterMutation depending on version) stage since it happens outside the feedback loop from the request and can be a little slow. if it shows large pending/blocked in nodetool tpstats might be overrunning your capacity. --- Chris Lohfink On May 12, 2014, at 5:03 PM, Batranut Bogdan batra...@yahoo.com wrote: Hello all, I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c counter, d counter After inserting a few million keys... 55 mil, the performance goes down the drain, 2-3 nodes in the cluster are on medium load, and when inserting batches of same lengths writes take longer and longer until the whole cluster becomes loaded and I get a lot of TExceptions... and the cluster becomes unresponsive. Did anyone have the same problem? Feel free to comment and share experiences about counter CF performance.
Running Production Cluster at Rackspace
Hi, can anyone point me to recommendations for hosting and configuration requirements when running a Production Cassandra Cluster at Rackspace? Are there reference projects that document the suitability of Rackspace for running a production Cassandra cluster? Jan
RE: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)
Hi Paulo, I’m using C* 1.2.15 and have no easy option to upgrade (at least not to 2.0.* branch). I’ve started to look if I can implement my variant of InputFormat. Thanks a lot for the hint, I’m for sure will check how it’s done in 2.0.6 and if it’s possible to backport it to 1.2.* branch. WBR, Anton From: Paulo Ricardo Motta Gomes [mailto:paulo.mo...@chaordicsystems.com] Sent: Thursday, May 15, 2014 3:21 AM To: user@cassandra.apache.org Subject: Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat) Hello Anton, What version of Cassandra are you using? If between 1.2.6 and 2.0.6 the setInputRange(startToken, endToken) is not working. This was fixed in 2.0.7: https://issues.apache.org/jira/browse/CASSANDRA-6436 If you can't upgrade you can copy AbstractCFIF and CFIF to your project and apply the patch there. Cheers, Paulo On Wed, May 14, 2014 at 10:29 PM, Anton Brazhnyk anton.brazh...@genesys.commailto:anton.brazh...@genesys.com wrote: Greetings, I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd like to read just part of it - something like Spark's sample() function. Cassandra's API seems allow to do it with its ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, but it doesn't work. The limit is just ignored and the entire column family is scanned. It seems this kind of feature is just not supported and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO). Questions: 1. Am I right that there is no way to get some data limited by token range with ColumnFamilyInputFormat? 2. Is there other way to limit the amount of data read from Cassandra with Spark and ColumnFamilyInputFormat, so that this amount is predictable (like 5% of entire dataset)? WBR, Anton -- Paulo Motta Chaordic | Platform www.chaordic.com.brhttp://www.chaordic.com.br/ +55 48 3232.3200
null date bug? Not sure if its cassandra 2.0.5 or the gocql (golang) driver.
Im noticing the following strange behaviour when I do a query on a table: cqlsh:mykeyspace select uuid, discontinued_from from mytable; uuid | discontinued_from --+-- b838a632-dd61-11e3-a32e-b8f6b11b1965 | -6795364578.871 b838e9b4-dd61-11e3-a330-b8f6b11b1965 | -6795364578.871 b838c725-dd61-11e3-a32f-b8f6b11b1965 | -6795364578.871 b8390aeb-dd61-11e3-a331-b8f6b11b1965 | 2014-01-01 10:00:00+1100 b83840b7-dd61-11e3-a32c-b8f6b11b1965 | -6795364578.871 b83882fc-dd61-11e3-a32d-b8f6b11b1965 | -6795364578.871 (6 rows) Failed to format value -6795364578.871 as timestamp: year out of range Failed to format value -6795364578.871 as timestamp: year out of range 3 more decoding errors suppressed. The discontinued from field is being updated with a golang time.Time variable that is either correctly initialised, or left as null, i.e.: err := cql.Query(update mytable set name=?, discontinued_from, updated=? where uuid=?, name, discontinuedFrom, time.Now().UnixNano(), s.Id).Exec() I would have expected updating a timestamp with null value should result in a null in the row. Is this a bug in gocql? Or am I misunderstanding how null can be used? (Is it not possible or allowed to set something to null??) Thanks, Jacob
Clustering order and secondary index
Hi all, I'm trying to migrate my old project born with Cassandra 0.6 and grown with 0.7 /1.0 to the latest 2.0. I have an easy question for you all: query using only secondary indexes do not respect any clustering order? Thanks
Re: Efficient bulk range deletions without compactions by dropping SSTables.
Hello Kevin, In 2.0.X an SSTable is automatically dropped if it contains only tombstones: https://issues.apache.org/jira/browse/CASSANDRA-5228. However this will most likely happen if you use LCS. STCS will create sstables of larger size that will probably have mixed expired and unexpired data. This could be solved by the single-sstable tombstone compaction that unfortunately is not working well ( https://issues.apache.org/jira/browse/CASSANDRA-6563). I don't know of a way to manually drop specific sstables safely, you could try implementing a script that compares sstables timestamps to check if an sstable is safely droppable as done in CASSANDRA-5228. There are proposals to create a compaction strategy optimized for log only data that only deletes old sstables but it's not ready yet AFAIK. Cheers, Paulo On Mon, May 12, 2014 at 8:53 PM, Kevin Burton bur...@spinn3r.com wrote: We have a log only data structure… everything is appended and nothing is ever updated. We should be totally fine with having lots of SSTables sitting on disk because even if we did a major compaction the data would still look the same. By 'lots' I mean maybe 1000 max. Maybe 1GB each. However, I would like a way to delete older data. One way to solve this could be to just drop an entire SSTable if all the records inside have tombstones. Is this possible, to just drop a specific SSTable? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people. -- *Paulo Motta* Chaordic | *Platform* *www.chaordic.com.br http://www.chaordic.com.br/* +55 48 3232.3200
Re: What % of cassandra developers are employed by Datastax?
so 30%… according to that data. On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.orgwrote: On 05/14/2014 03:39 PM, Kevin Burton wrote: I'm curious what % of cassandra developers are employed by Datastax? http://wiki.apache.org/cassandra/Committers -- Kind regards, Michael -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profilehttps://plus.google.com/102718274791889610666/posts http://spinn3r.com War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.