Re: CQL 3 returning duplicate keys

2013-06-04 Thread Eric Stevens
If this is a standard column family, not a CQL3 table, then using CQL3 will not give you the results you expect. From cassandra-cli, let's set up some test data: [default@unknown] create keyspace test; [default@unknown] use test; [default@test] create column family test; [default@test] set

Re: CQL 3 returning duplicate keys

2013-06-05 Thread Eric Stevens
away the main feature of the NoSQL store? Or am I am missing something obvious here? Regards, Shahab On Tue, Jun 4, 2013 at 2:12 PM, Eric Stevens migh...@gmail.com wrote: If this is a standard column family, not a CQL3 table, then using CQL3 will not give you the results you expect. From

Re: partition key of composite type and where partition_key in (...) clause

2013-06-05 Thread Eric Stevens
---+-++-+ 0|abc | def | 0 | abc | 0xdeadbeef 1|xyz | uvw | 1 | xyz | 0x8badf00d cqlsh:test SELECT * FROM tbl WHERE k1=0; k1k2 | k3 | k1 | k2 | m ---+-++-+ 0|abc | def | 0 | abc | 0xdeadbeef -Eric Stevens ProtectWise, Inc. On Wed, Jun 5, 2013 at 9:29 AM, Sorin

Re: Dynamic Columns Question Cassandra 1.2.5, Datastax Java Driver 1.0

2013-06-06 Thread Eric Stevens
column family. -Eric Stevens ProtectWise, Inc. On Thu, Jun 6, 2013 at 9:49 AM, Francisco Andrades Grassi bigjoc...@gmail.com wrote: Hi, CQL3 does now support dynamic columns. For tags or metadata values you could use a Collection: http://www.datastax.com/dev/blog/cql3_collections For wide

Re: Dynamic Columns Question Cassandra 1.2.5, Datastax Java Driver 1.0

2013-06-06 Thread Eric Stevens
mutating values are a problem as the collection gets large, or cases where you need to know only a subset of the the collection at a time. -Eric Stevens ProtectWise, Inc. On Thu, Jun 6, 2013 at 10:59 AM, Edward Capriolo edlinuxg...@gmail.comwrote: The problem about being careful about how much you

Re: nodetool ring showing different 'Load' size

2013-06-17 Thread Eric Stevens
Load is the size of the storage on disk as I understand it. This can fluctuate during normal usage even if records are not being added or removed, a node's load may be reduced during compaction for example. During compaction, especially if you use Size Tiered Compaction strategy (the default),

Re: Large number of files for Leveled Compaction

2013-06-17 Thread Eric Stevens
At the DataStax Cassandra Summit 2013 last week, Al Tobey from Ooyala recommended ss_table_size_in_mb be set at 256mb unless you have a fairly small data set. The talk was Extreme Cassandra Optimization, and it was superbly informative, I highly recommend it once DataStax gets the videos online.

Re: Joining distinct clusters with the same schema together

2013-06-19 Thread Eric Stevens
On its face my answer is not... really? What do you view yourself as getting with this technique versus using built in replication? As an example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM consistency level operations? Doing replication manually sounds like a recipe for the

Re: [Cassandra] Replacing a cassandra node

2013-06-21 Thread Eric Stevens
Is there a way to replace a failed server using vnodes? I only had occasion to do this once, on a relatively small cluster. At the time I just needed to get the new server online and wasn't concerned about the performance implications, so I just removed the failed server from the cluster and

Re: timeuuid and cql3 query

2013-06-21 Thread Eric Stevens
It's my understanding that if cardinality of the first part of the primary key has low cardinality, you will struggle with cluster balance as (unless you use WITH COMPACT STORAGE) the first entry of the primary key equates to the row key from the traditional interface, thus all entries related to

Re: [Cassandra] Replacing a cassandra node

2013-06-27 Thread Eric Stevens
, Eric Stevens migh...@gmail.com wrote: Is there a way to replace a failed server using vnodes? I only had occasion to do this once, on a relatively small cluster. ... Of course that caused a bunch of key reassignments, so I'm sure it would be less work for the cluster if I could bring

Re: columns disappearing intermittently

2013-07-03 Thread Eric Stevens
I wonder if one particular node is having trouble; when you notice the missing column, what happens if you execute the read manually from cqlsh or cassandra-cli independently directly on each node? On Wed, Jul 3, 2013 at 2:00 AM, Blake Eggleston bl...@grapheffect.comwrote: Hi All, We're

Re: going down from RF=3 to RF=2, repair constantly falls over with JVM OOM

2013-07-05 Thread Eric Stevens
The following setting is probably not a good idea: bloom_filter_fp_chance = 1.0 It would disable the bloom filters all together, and this setting doesn't have appreciably greater benefits over a setting of 0.1 (which has the advantage of saving you from disk I/O 90% of the time for keys which

Re: General doubts about bootstrap

2013-07-10 Thread Eric Stevens
= Adding a new node between other nodes would avoid running move, but the ring would be unbalanced, right? Would this imply in having a node (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing 3 nodes) overloaded? I'm refering

Re: Cassandra performance tuning...

2013-07-11 Thread Eric Stevens
You should be able to set the key_validation_class on the column family to use a different data type for the row keys. You may not be able to change this for a CF with existing data without some troubles due to a mismatch of data types; if that's a concern you'll have to create a separate CF and

Re: Representation of dynamically added columns in table (column family) schema using cqlsh

2013-07-12 Thread Eric Stevens
If you're creating dynamic columns via Thrift interface, they will not be reflected in the CQL3 schema. I would recommend not mixing paradigms like that, either stick with CQL3 or Thrift / cassandra-cli. With compact storage creates column families which can be interacted with meaningfully via

Re: Node tokens / data move

2013-07-14 Thread Eric Stevens
My understanding is that it is not possible to change the number of tokens after the node has been initialized. To do so you would first need to decommission the node, then start it clean with the appropriate num_tokens in the yaml. On Fri, Jul 12, 2013 at 9:17 PM, Radim Kolar h...@filez.com

Re: Node tokens / data move

2013-07-16 Thread Eric Stevens
vnodes currently do not brings any noticeable benefits to outweight trouble The main advantage of vnodes is that it lets you have flexibility with respect to adding and removing nodes from your cluster without having to rebalance your cluster (issuing a lot of moves). A shuffle is a lot of

Re: Node tokens / data move

2013-07-16 Thread Eric Stevens
with the shuffle process and boom, like that, out of disk space. David On Tue, Jul 16, 2013 at 8:35 AM, Eric Stevens migh...@gmail.com wrote: vnodes currently do not brings any noticeable benefits to outweight trouble The main advantage of vnodes is that it lets you have flexibility with respect

Netstats 100% streaming

2014-11-01 Thread Eric Stevens
We've been commissioning some new nodes on a 2.0.10 community edition cluster, and we're seeing streams that look like they're shipping way more data than they ought for individual files during bootstrap. /var/lib/cassandra/data/x/y/x-y-jb-11748-Data.db 3756423/3715409

Re: Netstats 100% streaming

2014-11-03 Thread Eric Stevens
-7878 which is fixed in 2.0.11 / 2.1.1 Mark On 1 November 2014 14:08, Eric Stevens migh...@gmail.com wrote: We've been commissioning some new nodes on a 2.0.10 community edition cluster, and we're seeing streams that look like they're shipping way more data than they ought for individual files

Re: read after write inconsistent even on a one node cluster

2014-11-06 Thread Eric Stevens
If this is just for doing tests to make sure you get back the data you expect, I would recommend looking some sort of eventually construct in your testing. We use Specs2 as our testing framework, and our write-then-read tests look something like this: someDAO.write(someObject) eventually {

Re: Re[2]: Redundancy inside a cassandra node

2014-11-08 Thread Eric Stevens
They do not use Raid10 on the node, they don't use dual power as well, because it's not cheap in cluster of many nodes I think the point here is that money spent on traditional failure avoidance models is better spent in a Cassandra cluster by instead having more nodes of less expensive

Re: nodetool repair stalled

2014-11-12 Thread Eric Stevens
Wouldn't it be a better idea to issue removenode on the crashed node, wipe the whole data directory (including system) and let it bootstrap cleanly so that it's not part of the cluster while it gets back up to speed? On Tue, Nov 11, 2014, 12:32 PM Robert Coli rc...@eventbrite.com wrote: On Tue,

Re: Two writers appending to a set to see which one wins?

2014-11-16 Thread Eric Stevens
You may be able to do something with conditional updates, however trying to use Cassandra for this kind of coordination smells to me a lot like typical antipatterns (eg write then read or read then write). You probably would do better if you need one writer to consistently win a race condition

Re: Reading the write time of each value in a set?

2014-11-16 Thread Eric Stevens
I'm not aware of a way to query TTL or writetime on collections from CQL yet. You can access this information from Thrift though. On Sat Nov 15 2014 at 12:51:55 AM DuyHai Doan doanduy...@gmail.com wrote: Why don't you use map to store write time as value and data as key? Le 15 nov. 2014

Re: Cassandra DC2 nodes down after increasing write requests on DC1 nodes

2014-11-16 Thread Eric Stevens
load average on DC1 nodes are around 3-5 and on DC2 around 7-10 Anecdotally I can say that loads in the 7-10 range have been dangerously high. When we had a cluster running in this range, the cluster was falling behind on important tasks such as compaction, and we really struggled to

Re: Deduplicating data on a node (RF=1)

2014-11-17 Thread Eric Stevens
If the new node never formally joined the cluster (streaming never completed, it never entered UN state), shouldn't that node be safe to scrub and start over again? It shouldn't be taking primary writes while it's bootstrapping, should it? On Mon Nov 17 2014 at 6:34:04 PM Michael Shuler

Re: Getting the counters with the highest values

2014-11-24 Thread Eric Stevens
You're right that there's no way to use the counter data type to materialize a view ordered by the counter. Computing this post hoc is the way to go if your needs allow for it (if not, something like Summingbird or vanilla Storm may be necessary). I might suggest that you make your primary key

Re: Getting the counters with the highest values

2014-11-25 Thread Eric Stevens
. Thanks for your response. Robert On Nov 24, 2014, at 9:40 AM, Eric Stevens migh...@gmail.com wrote: You're right that there's no way to use the counter data type to materialize a view ordered by the counter. Computing this post hoc is the way to go if your needs allow

Re: Issues in moving data from cassandra to elasticsearch in java.

2014-11-25 Thread Eric Stevens
Consider adding log_bucket timestamp, and then indexing that column. Your data loader can SELECT * FROM logs WHERE log_bucket = ?. The value you supply there would be the timestamp log bucket you're processing - in your case logged_at % 5. However, I'll caution against writing data to Cassandra

Re: multiple threads updating result in TransportException

2014-11-27 Thread Eric Stevens
A lot of people do a lot of multi-threaded work with Datastax Java Driver. It looks like you're using Cassandra Driver 2.0.0-RC2, might I suggest as a first step, at least upgrade to 2.0.0 final? RC2 wasn't even the final release candidate for 2.0.0. On Wed Nov 26 2014 at 8:44:07 AM Brian Tarbox

Re: Data synchronization between 2 running clusters on different availability zone

2014-11-27 Thread Eric Stevens
There's no reason you can't run on multiple cloud providers as long as you treat them as logically distinct DC's. It should largely work the same way as running in several AWS regions, but you'll need to use something like GossipingPropertyFileSnitch because the EC2 snitches are specific to AWS.

Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-27 Thread Eric Stevens
Be careful with creating many dynamically created column families unless you're cleaning up old ones to keep the total number of CF's reasonable. Having many column families will increase memory pressure and reduce overall performance. On Thu Nov 27 2014 at 8:19:35 AM DuyHai Doan

Re: Date Tiered Compaction Strategy and collections

2014-11-28 Thread Eric Stevens
The underlying write time is still tracked for each value in the collection - it's part of how conflict resolution is managed - but it's not exposed through CQL. On Fri Nov 28 2014 at 4:18:47 AM Batranut Bogdan batra...@yahoo.com wrote: Hello all, If one has a table like this: id text, ts

Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-28 Thread Eric Stevens
@Jens, will inactive CFs be released from C*'s memory after i.e. a few days or when under resource pressure? No, certain memory structures are allocated and will remain resident on each node for as long as the table exists. These CFs are used as time buckets, but are to be kept for speedy

Re: Recommissioned node is much smaller

2014-12-03 Thread Eric Stevens
How does the difference in load compare to the effective ownership? If you deleted the system directory as well, you should end up with new ranges, so I'm wondering if perhaps you just ended up with a really bad shuffle. Did you run removenode on the old host after you took it down (I assume so

Re: Cassandra taking snapshots automatically?

2014-12-03 Thread Eric Stevens
Do you have snapshot_before_compaction enabled? http://datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__snapshot_before_compaction On Wed Dec 03 2014 at 10:25:12 AM Robert Wille rwi...@fold3.com wrote: I built my first

Re: Recommissioned node is much smaller

2014-12-03 Thread Eric Stevens
high correlation. I think the moral of the story is that I shouldn’t delete the system directory. If I have issues with a node, I should recommission it properly. Robert On Dec 3, 2014, at 10:23 AM, Eric Stevens migh...@gmail.com wrote: How does the difference in load compare

Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-06 Thread Eric Stevens
B would work better in the case where you need to do sequential or ranged style reads on the id, particularly if id has any significant sparseness (eg, id is a timeuuid). You can compute the buckets and do reads of entire buckets within your range. However if you're doing random access by id,

Re: nodetool repair exception

2014-12-06 Thread Eric Stevens
The official recommendation is 100k: http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html I wonder if there's an advantage to this over unlimited if you're running servers which are dedicated to your Cassandra cluster (which you should be for

Re: How to model data to achieve specific data locality

2014-12-06 Thread Eric Stevens
It depends on the size of your data, but if your data is reasonably small, there should be no trouble including thousands of records on the same partition key. So a data model using PRIMARY KEY ((seq_id), seq_type) ought to work fine. If the data size per partition exceeds some threshold that

Re: Keyspace and table/cf limits

2014-12-06 Thread Eric Stevens
Based on recent conversations with Datastax engineers, the recommendation is definitely still to run a finite and reasonable set of column families. The best way I know of to support multitenancy is to include tenant id in all of your partition keys. On Fri Dec 05 2014 at 7:39:47 PM Kai Wang

Re: How to model data to achieve specific data locality

2014-12-07 Thread Eric Stevens
...@gmail.com wrote: On Sat, Dec 6, 2014 at 11:18 AM, Eric Stevens migh...@gmail.com wrote: It depends on the size of your data, but if your data is reasonably small, there should be no trouble including thousands of records on the same partition key. So a data model using PRIMARY KEY ((seq_id

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Eric Stevens
Hi Joy, Are you resetting your data after each test run? I wonder if your tests are actually causing you to fall behind on data grooming tasks such as compaction, and so performance suffers for your later tests. There are *so many* factors which can affect performance, without reviewing test

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-07 Thread Eric Stevens
, you're probably already in insane and difficult to recover crisis mode). On Sun Dec 07 2014 at 8:55:47 AM Eric Stevens migh...@gmail.com wrote: Hi Joy, Are you resetting your data after each test run? I wonder if your tests are actually causing you to fall behind on data grooming tasks

Re: How to model data to achieve specific data locality

2014-12-08 Thread Eric Stevens
with full CQL syntax.) would be very helpful. I mean, Cassandra has no “subset” concept, nor a “load subset” command, so what are we really talking about? Also, I presume we are talking CQL, but some of the references seem more Thrift/slice oriented. -- Jack Krupansky *From:* Eric Stevens migh

Re: Cassandra Doesn't Get Linear Performance Increment in Stress Test on Amazon EC2

2014-12-08 Thread Eric Stevens
calculate. Could you please describe in detail about your test deployment? Thank you very much, Joy 2014-12-07 23:55 GMT+08:00 Eric Stevens migh...@gmail.com: Hi Joy, Are you resetting your data after each test run? I wonder if your tests are actually causing you to fall behind on data

Re: UPDATE statement is failed

2014-12-10 Thread Eric Stevens
Writing then immediately reading the same data (or reading then immediately writing) are both antipatterns in any eventually consistent system, Cassandra included. You may need to investigate Compare and Set operations and see if they will work for your needs. Or else look into Serial

Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Eric Stevens
We're considering moving to a model where we put each of our tables in a dedicated keyspace. This is so we can tune replication per table, and change our mind about that replication on a per-table basis without a major migration. The biggest driver for this is Solr integration, we want to tune

Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Eric Stevens
at 11:21 AM, Eric Stevens migh...@gmail.com wrote: We're considering moving to a model where we put each of our tables in a dedicated keyspace. This is so we can tune replication per table, and change our mind about that replication on a per-table basis without a major migration. The biggest

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
Jon, The really important thing to really take away from Ryan's original post is that batches are not there for performance. tl;dr: you probably don't want batch, you most likely want many async calls My own rudimentary testing does not bear this out - at least not if you mean to say that

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
You can seen what the partition key strategies are for each of the tables, test5 shows the least improvement. The set (aid, end) should be unique, and bckt is derived from end. Some of these layouts result in clustering on the same partition keys, that's actually tunable with the ~15 per bucket

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
, Eric Stevens migh...@gmail.com wrote: You can seen what the partition key strategies are for each of the tables, test5 shows the least improvement. The set (aid, end) should be unique, and bckt is derived from end. Some of these layouts result in clustering on the same partition keys, that's

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
code in a gist or something? I can't really talk about your benchmark without seeing it and you're basing your stance on the premise that it is correct, which it may not be. On Sat Dec 13 2014 at 8:45:21 AM Eric Stevens migh...@gmail.com wrote: You can seen what the partition key strategies

Re: batch_size_warn_threshold_in_kb

2014-12-13 Thread Eric Stevens
to model my application. On Sat, Dec 13, 2014 at 10:58 AM, Eric Stevens migh...@gmail.com wrote: Isn't the net effect of coordination overhead incurred by batches basically the same as the overhead incurred by RoundRobin or other non-token-aware request routing? As the cluster size increases

Re: batch_size_warn_threshold_in_kb

2014-12-15 Thread Eric Stevens
), end, proto) reverse order= 25,163,064,000 traverse test5 ((aid, bckt, end)) = 30,233,744,000 On Sat, Dec 13, 2014 at 11:07 AM, Jonathan Haddad j...@jonhaddad.com wrote: On Sat Dec 13 2014 at 10:00:16 AM Eric Stevens migh...@gmail.com wrote: Isn't the net

Re: batch_size_warn_threshold_in_kb

2014-12-16 Thread Eric Stevens
be reading it wrong. Sorry I don't have more time to debug the script. Any of the above ideas apply? Jon On Mon Dec 15 2014 at 1:11:43 PM Eric Stevens migh...@gmail.com wrote: Unfortunately my Scala isn't the best so I'm going to have to take a little bit to wade through the code. I

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Eric Stevens
No, deletes are always written as a tombstone no matter the consistency. This is because data at rest is written to sstables which are immutable once written. The tombstone marks that a record in another sstable is now deleted, and so a read of that value should be treated as if it doesn't exist.

Re: Replacing nodes disks

2014-12-22 Thread Eric Stevens
You should be able to use Cassandra's built in tooling for sure. But just be aware that restoring from a backup of the data will be a lot faster and won't introduce any stress on the existing cluster. Repair and replace operations aren't free to the other nodes, so an offline backup and restore is

Re: installing cassandra

2014-12-22 Thread Eric Stevens
If you're just trying to get your feet wet with distributed software, and your node count is going to be reasonably low and won't grow any time soon, it's probably easier to just install it yourself rather than trying to also learn how to use software deployment technologies like puppet or chef.

Re: CQL3 vs Thrift

2014-12-24 Thread Eric Stevens
As Ryan mentioned, CQL is simply a translation layer to the underlying storage mechanism you're already familiar with with Thrift. There are definitely corner cases where it's not possible to get a one-for-one equivalent in CQL vs Thrift, and even when there's equivalents, the underlying data

Re: Counter Column

2014-12-26 Thread Eric Stevens
Timestamps are timezone independent. This is a property of timestamps, not a property of Cassandra. A given moment is the same timestamp everywhere in the world. To display this in a human readable form, you then need to know what timezone you're attempting to represent the timestamp as, this is

Re: Why read row is so slower than read column.

2014-12-26 Thread Eric Stevens
I would suggest enabling tracing in cqlsh and see what it has to say. There are many things which could cause this, but I'm thinking in particular you may have a lot of tombstones which get lifted when you read the whole row, and are missed when you read just one column. On Fri, Dec 26, 2014 at

Re: Counter Column

2014-12-27 Thread Eric Stevens
. Is that anyway we can avoid it and Cassandra assume the current time of the server? Thanks Ajay On Dec 26, 2014 10:50 PM, Eric Stevens migh...@gmail.com wrote: Timestamps are timezone independent. This is a property of timestamps, not a property of Cassandra. A given moment is the same

Re: any code to load large data from web into Cassandra

2014-12-27 Thread Eric Stevens
I think Joanne is taking not about bulk loading, but about just general access as in any standard client driver. Joanne, this is a pretty broad topic. You would need to have some part of a website built in some language such as Python or Java or some other language. Then you would use an

Re: Re: Why read row is so slower than read column.

2014-12-27 Thread Eric Stevens
Can you send us your exact data model? Even though you normally use Thrift, you may also be able to access the data from CQL, and if so, query tracing is a very powerful feature in CQL which may describe why there is a performance difference. Do you do deletes of data? If so, tombstones really

Re: Best practice for sorting on frequent updated column?

2014-12-29 Thread Eric Stevens
This is a bit difficult. Depending on your access patterns and data volume, I'd be inclined to keep a separate table with a (count, foreign_key) clustering key. Then do a client-side join to read the data back in the order you're looking for. That will at least make the heavily updated table

Re: User click count

2014-12-29 Thread Eric Stevens
If the counters get incorrect, it could't be corrected You'd have to store something that allowed you to correct it. For example, the TimeUUID approach to keep true counts, which are slow to read but accurate, and a background process that trues up your counter columns periodically. On Mon,

Re: CQL3 vs Thrift

2014-12-29 Thread Eric Stevens
So while not exactly the same, this seems like a good analogy for suggesting a third interface to fix problems with existing interfaces: http://xkcd.com/927/ Even if the CQL parsing code in Cassandra is subpar (I haven't studied it), that's not an especially compelling case to suggest replacing

Re: User click count

2014-12-31 Thread Eric Stevens
tombstones (say by default 20 days). Thanks Ajay On Mon, Dec 29, 2014 at 7:47 PM, Eric Stevens migh...@gmail.com wrote: If the counters get incorrect, it could't be corrected You'd have to store something that allowed you to correct it. For example, the TimeUUID approach to keep true

Re: is primary key( foo, bar) the same as primary key ( foo ) with a ‘set' of bars?

2015-01-02 Thread Eric Stevens
And also stored entirely for each UPDATE. Change one element, re-serialize the whole thing to disk. Is this true? I thought updates (adds, removes, but not overwrites) affected just the indicated columns. Isn't it just the reads that involve reading the entire collection? DS docs talk about

Re: Storing large files for later processing through hadoop

2015-01-02 Thread Eric Stevens
Can this split and combine be done automatically by cassandra when inserting/fetching the file without application being bothered about it? There are client libraries which offer recipes for this, but in general, no. You're trying to do something with Cassandra that it's not designed to do. You

Re: Re: Dynamic Columns

2015-01-26 Thread Eric Stevens
are you really recommending I throw 4 years of work out and completely rewrite code that works and has been tested? Our codebase was about 3 years old, and we finished migrating it to CQL not that long ago. It can definitely be frustrating to have to touch stable code to modernize it. Our

Re: Controlling the MAX SIZE of sstables after compaction

2015-01-26 Thread Eric Stevens
If you're concerned about impacting production performance, the steps of compacting and sstable2json will almost certainly also cause performance problems if performed on the same hardware. You won't get away from a production performance impact as long as you're using production hardware. If

Re: SStables can't compat automaticly

2015-01-26 Thread Eric Stevens
If you are doing only writes and no reads, then 'cold_reads_to_omit' is probably preventing your cluster from crossing a threshold where it decides it needs to engage in compaction. Setting it to 0.0 should fix this, but remember that you tuned it as you should be able to revert it to default

Re: Using Cassandra for geospacial search

2015-01-26 Thread Eric Stevens
Using Cassandra triggers is generally a fairly dangerous proposition, and generally not recommended.It's probably a better idea to load your search data with a separate process. On Mon, Jan 26, 2015 at 11:42 AM, Brian Sam-Bodden bsbod...@integrallis.com wrote: I did an little experiment

Re: Tombstone gc after gc grace seconds

2015-01-26 Thread Eric Stevens
My understanding is consistent with Alain's, there's no way to force a tombstone-only compaction, your only option is major compaction. If you're using size tiered, that comes with its own drawbacks. I wonder if there's a technical limitation that prevents introducing a shadowed data cleanup

Re: Using Cassandra for geospacial search

2015-01-26 Thread Eric Stevens
...@gmail.com wrote: That's actually GREAT news !! + Solr will give a lot of feature to Cassandra ! But while waiting for this huge feature (and wanted for a lot of users I guess) I guess that Prefix search will also be useful for using geohash... 2015-01-26 18:12 GMT+01:00 Eric Stevens migh

Re: Fixtures / CI docker

2015-01-26 Thread Eric Stevens
I don't have directly relevant advice, especially WRT getting a meaningful and coherent subset of your production data - that's probably too closely coupled with your business logic. Perhaps you can run a testing cluster with a default TTL on all your tables of ~2 weeks, feeding it with real

Re: Smart column searching for a particular rowKey

2015-02-04 Thread Eric Stevens
for a particular rowKey Thanks, it does. How about in astyanax? *From:* Eric Stevens [mailto:migh...@gmail.com migh...@gmail.com] *Sent:* Tuesday, February 03, 2015 1:49 PM *To:* user@cassandra.apache.org *Subject:* Re: Smart column searching for a particular rowKey WHERE + ORDER DESC + LIMIT

Re: Help on modeling a table

2015-02-02 Thread Eric Stevens
Just a minor observation: those field names are extremely long. You store a copy of every field name with every value with only a couple of exceptions: http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html Your partition key column name

Re: Cassandra on Ceph

2015-02-02 Thread Eric Stevens
Colin, I'm not familiar with Ceph, but it sounds like it's a more sophisticated version of a SAN. Be aware that running Cassandra on absolutely anything other than local disks is an anti-pattern. It will have a profound negative impact on performance, scalability, and reliability of your

Re: Smart column searching for a particular rowKey

2015-02-03 Thread Eric Stevens
WHERE + ORDER DESC + LIMIT should be able to accomplish that. On Tue, Feb 3, 2015 at 11:28 AM, Ravi Agrawal ragra...@clearpoolgroup.com wrote: Hi Guys, Need help with this. My rowKey is stockName like GOOGLE, APPLE. Columns are sorted as per timestamp and they include some set of data

Re: Mutable primary key in a table

2015-02-07 Thread Eric Stevens
I'm struggling to think of a model where it makes sense to update a primary key as a typical operation. It suggests, as Adil said, that you may be reasoning wrong about your data model. Maybe you can explain your problem in more detail - what kind of thing has you updating your PK on a regular

Re: Mutable primary key in a table

2015-02-08 Thread Eric Stevens
policy for old user names. For example, can they be reused, or are they locked, or... whatever. -- Jack Krupansky On Sun, Feb 8, 2015 at 1:48 AM, Ajaya Agrawal ajku@gmail.com wrote: On Sun, Feb 8, 2015 at 5:03 AM, Eric Stevens migh...@gmail.com wrote: I'm struggling to think of a model

Re: Timeseries: Include rows immediately adjacent to range query?

2015-01-15 Thread Eric Stevens
It seems like you should be able to solve it with two more queries immediately after your first query: SELECT * FROM timeseries WHERE tstamp ${MIN(firstQuery.tstamp)} LIMIT 1 SELECT * FROM timeseries WHERE tstamp ${MAX(firstQuery.tstamp)} LIMIT 1 On Tue, Jan 13, 2015 at 9:31 AM, Hugo José

Re: Storing PDF data on Cassandra db

2015-01-15 Thread Eric Stevens
@DENIZ, Jon's point is that CQL is the new standard, Thrift is frozen and being deprecated. Anything you build using the Thrift interface will hurt you over time, so you ought to just go for CQL. There really is next to no reason not to use CQL aside from personal preference, and that argument

Re: Many really small SSTables

2015-01-15 Thread Eric Stevens
Yes, many sstables can have a huge negative impact read performance, and will also create memory pressure on that node. There are a lot of things which can produce this effect, and it strongly also suggests you're falling behind on compaction in general (check nodetool compactionstats, you should

Re: Growing SSTable count as Cassandra does not saturate the disk I/O

2015-01-15 Thread Eric Stevens
compactors do not seem to help). Oddly enough, one node has just 160 SSTables while the rest are at 500-600 tables. Is size-tiered compaction easier on the CPU than leveled compaction? Thanks, William *From:* Eric Stevens [mailto:migh...@gmail.com] *Sent:* den 12 januari 2015 14:51

Re: Compaction failing to trigger

2015-01-20 Thread Eric Stevens
@Rob - he's probably referring to the thread titled Reasons for nodes not compacting? where Tyler speculates that the tables are falling below the cold read threshold for compaction. He speculated it may be a bug. At the same time in a different thread, Roland had a similar problem, and Tyler's

Re: Is there a way to add a new node to a cluster but not sync old data?

2015-01-21 Thread Eric Stevens
: Thanks for the reply. The bootstrap of new node put a heavy burden on the whole cluster and I don't know why. So that' the issue I want to fix actually. On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens migh...@gmail.com wrote: Yes, but it won't do what I suspect you're hoping for. If you

Re: number of replicas per data center?

2015-01-19 Thread Eric Stevens
Ah.. six replicas. At least its super inexpensive that way (sarcasm!) Well it's up to you to decide what your data locality and fault tolerance requirements are. If you want to run two DC's, costs are going to increase since each DC has a full set of replicas within itself. But you get the

Re: Cassandra fetches complete partition

2015-01-19 Thread Eric Stevens
It depends on your version of Cassandra. I would suggest starting with this, which describes the differences between 2.0 and 2.1 http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1 In particular: In previous releases, this cache has required storing the entire partition in memory,

Re: Nodetool removenode stuck

2015-01-19 Thread Eric Stevens
I've seen removenode hang indefinitely also (per CASSANDRA-6542). Generally speaking, if a node is in good health and you want to take it out of the cluster for whatever reason (including the one you mentioned), nodetool decommission is a better choice. Removenode is for when a node is

Re: Many really small SSTables

2015-01-16 Thread Eric Stevens
see this only on testing cluster). I looks to me that compactions were not triggered. I tried a nodetool compact on one node overnight - but that crashed the entire node. Roland Am 15.01.2015 um 19:14 schrieb Eric Stevens: Yes, many sstables can have a huge negative impact read performance

Re: Retrieving all row keys of a CF

2015-01-16 Thread Eric Stevens
Note that getAllRows() is deprecated in Astyanax (see here https://github.com/Netflix/astyanax/wiki/Getting-Started#iterate-through-the-entire-keyspace-deprecated ). You should prefer to use the AllRowsReader recipe: https://github.com/Netflix/astyanax/wiki/AllRowsReader-All-rows-query Note the

Re: Retrieving all row keys of a CF

2015-01-17 Thread Eric Stevens
If you're getting partial data back, then failing eventually, try setting .withCheckpointManager() - this will let you keep track of the token ranges you've successfully processed, and not attempt to reprocess them. This will also let you set up tasks on bigger data sets that take hours or days

Re: Not enough replica available” when consistency is ONE?

2015-01-18 Thread Eric Stevens
Check out http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_tunable_consistency_c.html Cassandra 2.0 uses the Paxos consensus protocol, which resembles 2-phase commit, to support linearizable consistency. All operations are quorum-based ... This kicks in whenever you do CAS

Re: changes to metricsReporterConfigFile requires restart of cassandra?

2015-02-11 Thread Eric Stevens
AFAIK yes. If you want just a subset of the metrics, I would suggest exporting them all, and filtering on the Graphite side. On Wed, Feb 11, 2015 at 6:54 AM, Erik Forsberg forsb...@opera.com wrote: Hi! I was pleased to find out that cassandra 2.0.x has added support for pluggable metrics

  1   2   3   >