They are exponential decaying moving averages (like Unix load averages)
of the number of events per unit of time.
http://wiki.apache.org/cassandra/Metrics might help
On 04/17/2014 06:06 PM, Redmumba wrote:
Good afternoon,
I'm attempting to integrate the metrics generated via JMX into our
What does the rate signify in this context? For example, given the
OneMinuteRate of 675.7673129014964 and the unit of seconds--what is this
measuring?
means that there were 675 write requests per second over the last one minute.
As Other Chris (tm) mentioned this is exp decaying
I tried to do this, however the doubling in disk space is not temporary
as you state in your note. What am I missing?
On Fri, Apr 11, 2014 at 10:44 AM, William Oberman
ober...@civicscience.comwrote:
So, if I was impatient and just wanted to make this happen now, I could:
1.) Change
Hi,
It seems like it should be possible to have a keyspace replicated only to a
subset of DC's on a given cluster spanning across multiple DCs? Is there
anything bad about this approach?
Scenario
Cluster spanning 4 DC's = CA, TX, NY, UT
Has multiple keyspaces such that
* keyspace_CA_TX -
In the system we're using, we have a large fleet of servers constantly
appending time-based data to our database--it's largely writes, very few
reads (it's auditing data). However, our cluster max space is around 80TB,
and we'd like to maximize how much data we can retain.
One option is to
Does cassandra delete tombstones during simple LCS compaction or I should use
node tool repair?
Thanks.
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html
Sent from the cassandra-u...@incubator.apache.org mailing list
That'll be really useful, thanks!!
On Wed, May 14, 2014 at 7:47 PM, Aaron Morton aa...@thelastpickle.comwrote:
As of 2.0.7, driftx has added this long-requested feature.
Thanks
A
-
Aaron Morton
New Zealand
@aaronmorton
Co-Founder Principal Consultant
Apache
Just a few data points from our experience
One of our use cases involves storing a periodic full base state for millions
of records, then fairly frequent delta updates to subsets of the records in
between. C* is great for this because we can read the whole row (or up to the
clustering
It means asynchronous write mutations were dropped, but if the writes are
completing without TimedOutException, then at least ConsistencyLevel
replicas were correctly written. The remaining replicas will eventually be
fixed by hinted handoff, anti-entropy (repair) or read repair.
More info:
perhaps the committers should invite other developers that have shown an
interest in contributing to Cassandra.
the rate of adding new non-Datastax committers appears to be low the last
2 years. I have no data to support it, it's just a feeling based personal
observations the last 3 years.
Hello,
Have you looked at using the CLUSTERING ORDER BY and LIMIT features of
CQL3?
These may help you achieve your goals.
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html
Of the 16 active committers, 8 are not at DataStax. See
http://wiki.apache.org/cassandra/Committers. That said, active involvement
varies and there are other contributors inside DataStax and in the community.
You can look at the dev mailing list as well to look for involvement in more
Why not use NetworkTopology and specify each region as a ‘DC’ ?
Setup a snitch (propertyFile or Gossip, or even the EC2Region one) to list out
which nodes are in which DC.
Then when creating the Keyspace, specify NetworkTopology, with RF1 in each DC
/ Rack.
Ie.
CREATE KEYSPACE
Cassaforte [1] is a Clojure client for Cassandra built around CQL
and focusing on ease of use.
Release notes:
http://blog.clojurewerkz.org/blog/2014/05/15/cassaforte-1-dot-3-0-is-released/
1. http://clojurecassandra.info
--
MK
http://github.com/michaelklishin
http://twitter.com/michaelklishin
On 05/14/2014 03:39 PM, Kevin Burton wrote:
I'm curious what % of cassandra developers are employed by Datastax?
http://wiki.apache.org/cassandra/Committers
--
Kind regards,
Michael
Don’t know, but as a potential customer of DataStax I’m also concerned at the
fact that there does not seem to be a competitor offering Cassandra support and
services. All innovation seems to be occurring only in the OSS version or
DSE(*). I’d welcome a competitor for DSE - it does not even
Shameless plug:
http://www.evidencebasedit.com/guide-to-cassandra-thread-pools/#droppable
On May 15, 2014, at 7:37 PM, Mark Reddy mark.re...@boxever.com wrote:
Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for
further details.
Mark
On Fri, May 9, 2014 at
Hi all,
more than a years ago I wrote a comment for migrating an old schema to a new
model.
Since the company had other priorities we didn't realize, and now I'm trying
to upgrade
my 0.6 data-model to the newest 2.0 model.
The DB contains mainly comments written by users on companies.
Comments
Here’s a meetup talk on analytics using Cassandra, Storm, and Kafka:
http://www.slideshare.net/aih1013/building-largescale-analytics-platform-with-storm-kafka-and-cassandra-nyc-storm-user-group-meetup-21st-nov-2013
-- Jack Krupansky
From: Manoj Khangaonkar
Sent: Thursday, May 8, 2014 5:43 PM
Earlier I reported the following bug against C* 2.0.5
https://issues.apache.org/jira/browse/CASSANDRA-7176
It seems to be fixed in C* 2.0.7, but we are still seeing similar
suspicious timeouts.
We have a cluster of C* 2.0.7, DC1:3, DC2:3
We have the following table:
CREATE TABLE
Yes the global limits are OK. I added cassandra to '/etc/rc.local' to make
it auto-startup, but seems the modification of limits didn't take effect. I
observed this as Bryan suggested, so I added
ulimit -SHn 99
to '/etc/rc.local' and before cassandra start command, and it worked.
On Thu,
Unfortunately, I found the documentation to be very lackluster. However, I
have actually begun to use the Yammer Metrics library in other projects, so
I have a much better understanding of what it generates. Thank you for the
response!
(also, for some strange reason, I am just getting the email
Nodetool cleanup deletes rows that aren't owned by specific tokens
(shouldn't be on this node). And nodetool repair makes sure data is in sync
between all replicas. It is wrong to say either of these commands cleanup
tombstones. Tombstones are only cleaned up during compactions only if they
are
Perhaps because the developers are working on DSE :-P
On Fri, May 16, 2014 at 8:13 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:
Of the 16 active committers, 8 are not at DataStax. See
http://wiki.apache.org/cassandra/Committers. That said, active
involvement varies and there are
Yes, but still you need to run 'nodetool cleanup' from time to time to make
sure all tombstones are deleted.
On Fri, May 16, 2014 at 10:11 AM, Dimetrio dimet...@flysoft.ru wrote:
Does cassandra delete tombstones during simple LCS compaction or I should
use
node tool repair?
Thanks.
--
Thanks. My case is that there is no public ip and VPN cannot be set up. It
seems that I have to run EMR job to operate on the AWS cassandra cluster.
I got some timeout errors during running the EMR job as:
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at
Hi Michael, thanks for the reply,
I would RAID0 all those data drives, personally, and give up managing them
separately. They are on multiple PCIe controllers, one drive per channel,
right?
Raid 0 is a simple way to go but one disk failure can cause the whole
volume down, so I am afraid raid
Hello,
I'm working on data modeling for a Pinterest-like project. There are
basically two main concepts: Pin and Board, just like Pinterest, where pin
is an item containing an image, description and some other information such
as a like count, and each board should contain a sorted list of Pins.
If you make the timestamp the partition key you won't be able to do range
queries (unless you use an ordered partitioner).
Assuming you are logging from multiple devices you will want your partition key
to be the device id the date, your clustering key to be the timestamp
(timeuuid are good
Thank you for your answer, I really appreciate that you want to help me.
But already found out that I did something wrong in my implementation.
Am 13.05.2014 02:53, schrieb Chris Lohfink:
That is not expected. What client are you using and how are you setting the
ttls? What version of
Hello Anton,
What version of Cassandra are you using? If between 1.2.6 and 2.0.6 the
setInputRange(startToken, endToken) is not working.
This was fixed in 2.0.7:
https://issues.apache.org/jira/browse/CASSANDRA-6436
If you can't upgrade you can copy AbstractCFIF and CFIF to your project and
My system log is full of messages like this one:
WARN [ReadStage:42] 2014-05-15 08:19:13,615 SliceQueryFilter.java (line
210) Read 0 live and 2829 tombstoned cells in
TrafficServer.rawData.rawData_evaluated_idx (see tombstone_warn_threshold)
I've run a major compaction but the tombstones are not
Let's say I have an external job (MR, pig, etc) sorting a cassandra table
by some complicated mechanism.
We want to store the sorted records BACK into cassandra so that clients can
read the records sorted.
What I was just thinking of doing was storing the records as pages.
So page 0 would have
You can watch this: https://www.youtube.com/watch?v=uoggWahmWYI
Aaron is discussing about support for big nodes
On Wed, May 14, 2014 at 3:13 AM, Yatong Zhang bluefl...@gmail.com wrote:
Thank you Aaron, but we're planning about 20T per node, is that feasible?
On Mon, May 12, 2014 at 4:33
Hello Kevin
For the internal working of secondary index and LIMIT, you can have a look
at this : https://issues.apache.org/jira/browse/CASSANDRA-5975
The comments and attached patch will give you a hint on how LIMIT is
implemented. Alternatively you can look directly in the source code
That and nobarrier… and probably noop for the scheduler if using SSD and
setting readahead to zero...
On Fri, May 16, 2014 at 10:29 AM, James Campbell
ja...@breachintelligence.com wrote:
Hi all—
What partition type is best/most commonly used for a multi-disk JBOD setup
running Cassandra
Hello
I am having a 4 node cluster where 2 nodes are in one data center and
another 2 in a different one.
But in the first data center the token ownership is not equally
distributed. I am using vnode feature.
num_tokens is set to 256 in all nodes.
initial_number is left blank.
Datacenter: DC1
Hi,
Recommending nobarrier (mount option barrier=0) when you don't know if
a non-volatile cache in play is probably not the way to go. A
non-volatile cache will typically ignore write barriers if a given
block device is configured to cache writes anyways.
I am also skeptical you will see a
Hi all-
What partition type is best/most commonly used for a multi-disk JBOD setup
running Cassandra on CentOS 64bit?
The datastax production server guidelines recommend XFS for data partitions,
saying, Because Cassandra can use almost half your disk space for a single
file, use XFS when
I was reading this
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra and
need some confirmation:
A Sizing
*Each level is ten times as large as the previous*
In the comments:
At October 14, 2011 at 12:33
What you show is basically the idea of bucketing data. One bucket = one
physical partition. Within each bucket, there is a fixed number of column
(1000 in your example).
This strategy works fine and avoid too large partition. The only draw back
I would see is the need to fetch data over buckets
The problem is whether I should denormalize details of pins into the board
table or just retrieve pins by page (page size can be 10~20) and then
multi-get by pin_ids to obtain details
-- Denormalize is the best way to go in your case. Otherwise, for 1 board
read, you'll have 10-20 subsequent
It's also good to note that only the Data files are compressed already.
Depending on your data the Index and other files may be a significant
percent of total on disk data.
On 05/02/2014 01:14 PM, tommaso barbugli wrote:
In my tests compressing with lzop sstables (with cassandra compression
Hi,
I know this has been discussed before, and I know there are
limitations to how many rows one partition key in practice can handle.
But I am not sure if number of rows or total data is the deciding
factor. I know the thrift interface well, but this is my first
project where we are actively
On Mon, May 12, 2014 at 3:03 PM, Batranut Bogdan batra...@yahoo.com wrote:
I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter,
c counter, d counter
Feel free to comment and share experiences about counter CF performance.
Briefly :
1) Counters original version are
Note that Cassandra will not compact away some tombstones if you have differing
column TTLs. See the following jira and resolution I filed for this:
https://issues.apache.org/jira/browse/CASSANDRA-6654
On May 16, 2014 4:49 PM, Chris Lohfink clohf...@blackbirdit.com wrote:
It will delete them
For now you can edit the nodetool script itself by adding
-Duser.home=/tmp
as in
$JAVA $JAVA_AGENT -cp $CLASSPATH
-Xmx32m
-Duser.home=/tmp
-Dlogback.configurationFile=logback-tools.xml
-Dstorage-config=$CASSANDRA_CONF
org.apache.cassandra.tools.NodeTool -p $JMX_PORT $ARGS
if
You can always check the project committer wiki:
http://wiki.apache.org/cassandra/Committers
-- Jack Krupansky
From: Kevin Burton
Sent: Wednesday, May 14, 2014 4:39 PM
To: user@cassandra.apache.org
Subject: What % of cassandra developers are employed by Datastax?
I'm curious what % of
Hi, I'm modeling some queries in CQL3.
I'd like to query first 1 columns for each partitioning keys in CQL3.
For example:
create table posts(
author ascii,
created_at timeuuid,
entry text,
primary key(author,created_at)
);
insert into posts(author,created_at,entry) values
Hi Anton,
One approach you could look at is to write a custom InputFormat that
allows you to limit the token range of rows that you fetch (if the
AbstractColumnFamilyInputFormat does not do what you want). Doing so
is not too much work.
If you look at the class RowIterator within
I'm struggling with cassandra secondary indexes since the documentation
seems all over the place and I'm having to put together everything from
blog posts.
Anyway.
If I have a low cardinality index of say 10 values, and 1M records. This
means each secondary index key will have references to
It will delete them after gc_grace_seconds (set per table) and a compaction.
---
Chris Lohfink
On May 16, 2014, at 9:11 AM, Dimetrio dimet...@flysoft.ru wrote:
Does cassandra delete tombstones during simple LCS compaction or I should use
node tool repair?
Thanks.
--
View this
There does seem to be some effort trying to encourage others - DataStax had
some talks explaining how to contribute. This year there is even a extra
bootcamp
http://learn.datastax.com/CassandraSummitBootcampApplication.html
On May 16, 2014, at 9:47 AM, Peter Lin wool...@gmail.com wrote:
If the data is read from a slice of a partition that has been added over
time there will be a part of that row in every almost sstable. That would
mean all of them (multiple disk seeks depending on clustering order per
sstable) would have to be read from in order to service the query. Data
I used cassandra for years at NYSE and we were able to do what we wanted with
cassandra by leveraging open source and internal development knowing that
cassandra did what we wanted it to do and that no one could ever take the code
away from us in a worst case scenario.
Compare and contrast
Thanks for your answer, I really like the frequency of update vs read way of
thinking.
A related question is whether it is a good idea to denormalize on read-heavy
part of data while normalize on other less frequently-accessed data?
Our app will have a limited number of system managed boards
It's often an excellent strategy. No known issues.
-Tupshin
On May 16, 2014 4:13 PM, Anand Somani meatfor...@gmail.com wrote:
Hi,
It seems like it should be possible to have a keyspace replicated only to
a subset of DC's on a given cluster spanning across multiple DCs? Is there
anything
Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for
further details.
Mark
On Fri, May 9, 2014 at 12:52 PM, Raveendran, Varsha IN BLR STS
varsha.raveend...@siemens.com wrote:
Hello,
I am writing around 10Million records continuously into a single node
Cassandra
Hi,
I am using Cassandra 2.0.5 version. I trying to setup 2 keyspace with same
tables for different testing. While creating index on the tables, I
realized I am not able to use the same index name though the tables are in
different keyspaces. Is maintaining unique index name across keyspace is
What version are you using? and what consistency level are you using for your
inserts? A CL.ONE for instance can end up with a large backup in the
replicateOnWrite (or CounterMutation depending on version) stage since it
happens outside the feedback loop from the request and can be a little
Hi,
can anyone point me to recommendations for hosting and configuration
requirements when running a Production Cassandra Cluster at Rackspace?
Are there reference projects that document the suitability of Rackspace for
running a production Cassandra cluster?
Jan
Hi Paulo,
I’m using C* 1.2.15 and have no easy option to upgrade (at least not to 2.0.*
branch).
I’ve started to look if I can implement my variant of InputFormat.
Thanks a lot for the hint, I’m for sure will check how it’s done in 2.0.6 and
if it’s possible to backport it to 1.2.* branch.
Im noticing the following strange behaviour when I do a query on a table:
cqlsh:mykeyspace select uuid, discontinued_from from mytable;
uuid | discontinued_from
--+--
Hi all,
I'm trying to migrate my old project born with Cassandra 0.6 and grown with 0.7
/1.0 to the latest 2.0.
I have an easy question for you all: query using only secondary indexes do not
respect any clustering order?
Thanks
Hello Kevin,
In 2.0.X an SSTable is automatically dropped if it contains only
tombstones: https://issues.apache.org/jira/browse/CASSANDRA-5228. However
this will most likely happen if you use LCS. STCS will create sstables of
larger size that will probably have mixed expired and unexpired data.
so 30%… according to that data.
On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.orgwrote:
On 05/14/2014 03:39 PM, Kevin Burton wrote:
I'm curious what % of cassandra developers are employed by Datastax?
http://wiki.apache.org/cassandra/Committers
--
Kind regards,
66 matches
Mail list logo