Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry
"X-store" refers to how data is stored, in almost every case it refers to what logical constructs are grouped together physically on disk. It has nothing to do with whether a database is relational or not. Cassandra does, in fact meet the definition of row-store, however, I would like to

Re: Cassandra data model right definition

2016-10-03 Thread Russell Bradberry
A couple things I would like to note: 1. Cassandra does not determine how data is stored on disk, the compaction strategy does. One could, in theory, (and I believe some are trying) could create a column-store compaction strategy. There is a large effort in the database community overall to

Re: Cassandra data model right definition

2016-09-30 Thread Russell Bradberry
I agree 100%, this misunderstanding really bothers me as well.  I like the term “Partitioned Row Store” even though I am guilty of using the legacy “Column-Family Store” from darker times.  Even databases like Scylla which is supposed to be an Apache Cassandra clone tout themselves as a

Re: unsubscibe

2016-08-13 Thread Russell Bradberry
all of the things you can do via email, but who reads instructions?!?! :). I wonder if infra could force a footer on the emails or something. On Sat, Aug 13, 2016 at 8:35 PM Russell Bradberry <rbradbe...@gmail.com> wrote: I think the overall issue here is that t

Re: unsubscibe

2016-08-13 Thread Russell Bradberry
I think the overall issue here is that there are many apps that provide an "unsubscribe" button that automagically sends these emails. I think the best course of action would be to bring this up to the powers that be to possibly decide on supporting this functionality as a feature. This, of

Re: Decommissioned node shows up in the gossip log

2016-05-03 Thread Russell Bradberry
The impact is that it is still in gossip and may still be in your peers. Node tool status pulls from the snitch, not gossip, so since it was decommissioned it will not show up there. The only way to remove it from gossip would be to unsafeAssasinate the endpoint. From: "Zhang, Charles"

Re: apache cassandra for trading system

2016-03-25 Thread Russell Bradberry
One option could be to set up two data centers and have two separate keyspaces, one for today data and the other for historical data. You can write to the today_data keyspace with a TTL of 24 hours then write the same data to the historical_data keyspace. You then set up your replication to

Re: Cassandra DSE Solr - search JSON content in column

2016-01-13 Thread Russell Bradberry
You can use the full text wildcard search as mentioned. However, if you need something more specific like certain fields in the JSON indexed, you can use DSE SOLR field transformers. http://www.datastax.com/dev/blog/dse-field-transformers From: DuyHai Doan Reply-To:

Re: INSERT JSON TimeStamp

2015-09-28 Thread Russell Bradberry
That is not a valid date in CQL, and JSON does not enforce a specific date format. A correctly formatted date would look something like “2015-01-01 00:00:00”. From: Ashish Soni Reply-To: Date: Monday, September 28, 2015 at 3:51 PM To:

Re: INSERT JSON TimeStamp

2015-09-28 Thread Russell Bradberry
t;1" } } }'; On Mon, Sep 28, 2015 at 4:11 PM, Steve Robenalt <sroben...@highwire.org> wrote: Hi Ashish, Most Json parsers expect either a raw long integer value or some version of an iso-8601 date or timestamp. See https://en.wikipedia.org/wiki/ISO_8601 for a good reference.

Re: Cassandra Summit 2015 Roll Call!

2015-09-22 Thread Russell Bradberry
I will be wearing a red t-shirt that says SimpleReach and I will be at the reception tonight, the MVP dinner and the summit both days. I'm about 5'11" and probably going to be the best looking person there. ;) See you all at the summit. On Tue, Sep 22, 2015 at 11:27 AM, Robert Coli

Re: Configuring Cassandra to limit number of columns to read

2015-08-14 Thread Russell Bradberry
The idea that you have 250k columns is somewhat of an anti-pattern. In this case you would typically have a few columns and many rows, then just run a select with a limit clause in your partition. From: Jonathan Haddad Reply-To: user@cassandra.apache.org Date: Friday, August 14, 2015 at

Re: only grant select , but still can modify data

2015-08-05 Thread Russell Bradberry
Did you set your authorizer correctly? http://docs.datastax.com/en/cassandra/1.2/cassandra/security/secure_config_native_authorize_t.html -Russ From: Dan Jatnieks Reply-To: user@cassandra.apache.org Date: Wednesday, August 5, 2015 at 5:03 PM To: user@cassandra.apache.org Subject: Re: only

Re: Regarding JIRA

2015-06-01 Thread Russell Bradberry
Also, feel free to use any of the many other resources available. The Documentation Planet Cassandra Stack Overflow #cassandra on irc.freenode.net From: Daniel Compton Reply-To: user@cassandra.apache.org Date: Monday, June 1, 2015 at 3:37 PM To: user@cassandra.apache.org Subject: Re:

Re: Can a Cassandra node accept writes while being repaired

2015-05-07 Thread Russell Bradberry
Yes On Thu, May 7, 2015 at 9:53 AM -0700, Khaja, Raziuddin (NIH/NLM/NCBI) [C] raziuddin.kh...@nih.gov wrote: I was not able to find a conclusive answer to this question on the internet so I am asking this question here. Is a Cassandra node able to accept insert or delete

Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Russell Bradberry
I would like to note that this will require all clients connect over the external IP address. If you have clients within Amazon that need to connect over the private IP address, this would not be possible. If you have a mix of clients that need to connect over private IP address and public,

Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Russell Bradberry
There are a couple options here. You can use the built in address translator, or, write a new load balancing policy. See https://datastax-oss.atlassian.net/browse/JAVA-145 for more information. From: Jonathan Haddad Reply-To: user@cassandra.apache.org Date: Monday, April 20, 2015 at 12:50

Re: Bitmaps

2014-10-06 Thread Russell Bradberry
I highly recommend against storing data structures like this in C*. That really isn't it's sweet spot. For instance, if you were to use the blob type which will give you the smallest size, you are still looking at a cell size of (90,000,000/8/1024) = 10,986 or over 10MB in size, which is

Re: hardware sizing for cassandra

2014-09-09 Thread Russell Bradberry
*TL;DR* There is no one recommended setup for Cassandra, everyone's use-case is different and it is up to you to figure out the best setup for your use-case. There are a lot of questions that need to be asked before making a decision on hardware layout. There is just so

Re: hardware sizing for cassandra

2014-09-09 Thread Russell Bradberry
Because RAM is expensive and the JVM heap is limited to 8gb. While you do get benefit out of using extra RAM as page cache, it's often not cost efficient to do so Again, this is so use-case dependent. I have met several people that run small nodes with fat ram to get it all in memory to

Re: MapReduce Integration?

2014-08-26 Thread Russell Bradberry
If you want true integration of Cassandra and Hadoop and Spark then you will need to use Datastax Enterprise (DSE).  There are connectors that will allow MapReduce over vanilla Cassandra, however, they are just making requests to Cassandra under the covers while DSE uses CFS which is similar to

Re: Options for expanding Cassandra cluster on AWS

2014-08-19 Thread Russell Bradberry
I’m not sure about Datastax’s official stance but using the SSD backed instances (ed. i2.2xl, c3.4xl etc) outperform the m2.2xl greatly. Also, since Datastax is pro-ssd, I doubt they would still recommend to stay on magnetic disks. That said, I have benchmarked all the way up to the c3.8xl

Re: EC2 SSD cluster costs

2014-08-19 Thread Russell Bradberry
Short answer, it depends on your use-case. We migrated to i2.xlarge nodes and saw an immediate increase in performance.   If you just need plain ole raw disk space and don’t have a performance requirement to meet then the m1 machines would work, or hell even SSD EBS volumes may work for you.  

Re: Cassandra select results differs

2014-07-23 Thread Russell Bradberry
sounds like you may need to run a repair On July 23, 2014 at 12:50:23 PM, Batranut Bogdan (batra...@yahoo.com) wrote: Hello all, I have a CF  CREATE TABLE cf (   a text,   b int,   c int,   d int,   e int,   PRIMARY KEY (a) )  WITH   bloom_filter_fp_chance=0.01 AND   caching='KEYS_ONLY'

Re: Cassandra select results differs

2014-07-23 Thread Russell Bradberry
jobs that repair every week. node 1 - monday , node 2 tuesday . On Wednesday, July 23, 2014 7:52 PM, Russell Bradberry rbradbe...@gmail.com wrote: sounds like you may need to run a repair On July 23, 2014 at 12:50:23 PM, Batranut Bogdan (batra...@yahoo.com) wrote: Hello all, I have a CF

Re: Which way to Cassandraville?

2014-07-22 Thread Russell Bradberry
Having an ORM says nothing about the maturity of a database, it says more about the community and their willingness to create one.  The database itself has nothing to do with the creation of the ORM.  Atop everything else, as was stated, knowing how to model your queries is the most important

Re: EBS SSD - Cassandra ?

2014-06-19 Thread Russell Bradberry
does an elastic network interface really use a different physical network interface? or is it just to give the ability for multiple ip addresses? On June 19, 2014 at 3:56:34 PM, Nate McCall (n...@thelastpickle.com) wrote: If someone really wanted to try this it, I recommend adding an Elastic

Re: running out of diskspace during maintenance tasks

2014-06-18 Thread Russell Bradberry
repair only creates snapshots if you use the “-snapshot” option. On June 18, 2014 at 12:28:58 PM, Marcelo Elias Del Valle (marc...@s1mbi0se.com.br) wrote: AFAIK, when you run a repair a snapshot is created. After the repair, I run nodetool clearsnapshot to save disk space. Not sure it's you

Re: Customized Compaction Strategy: Dev Questions

2014-06-04 Thread Russell Bradberry
You mean this: https://issues.apache.org/jira/browse/CASSANDRA-5228 ? On June 4, 2014 at 12:42:33 PM, Redmumba (redmu...@gmail.com) wrote: Good morning! I've asked (and seen other people ask) about the ability to drop old sstables, basically creating a FIFO-like clean-up process.  Since

Re: Customized Compaction Strategy: Dev Questions

2014-06-04 Thread Russell Bradberry
to try and guess how much data is being put in--since this is auditing data, the usage can vary wildly depending on time of year, verbosity of auditing, etc..  I'd like to maximize the disk space--not optimize the cleanup process. Andrew On Wed, Jun 4, 2014 at 9:47 AM, Russell Bradberry

Re: Customized Compaction Strategy: Dev Questions

2014-06-04 Thread Russell Bradberry
, and probably GC (to remove the old tables), but since I'm not super-familiar with the C* internals, I wanted to make sure it was feasible with the current toolset before I actually dived in and started tinkering. Andrew On Wed, Jun 4, 2014 at 10:04 AM, Russell Bradberry rbradbe...@gmail.com

Re: Customized Compaction Strategy: Dev Questions

2014-06-04 Thread Russell Bradberry
-only table, more or less--the only deletes that occur in the current system are to delete the old data. On Wed, Jun 4, 2014 at 10:24 AM, Russell Bradberry rbradbe...@gmail.com wrote: I’m not sure what you want to do is feasible.  At a high level I can see you running into issues with RF etc

Re: Customized Compaction Strategy: Dev Questions

2014-06-04 Thread Russell Bradberry
a TopologicalCompactionStrategy or similar. On Wed, Jun 4, 2014 at 10:40 AM, Russell Bradberry rbradbe...@gmail.com wrote: Maybe I’m misunderstanding something, but what makes you think that running a major compaction every day will cause they data from January 1st to exist in only one SSTable and not have data from

Re: I don't understand paging through a table by primary key.

2014-05-30 Thread Russell Bradberry
I think what you want is a clustering column”.  When you model your data, you specify “partition columns” which are synonymous with the old thrift style “keys” and clustering columns.  When creating your PRIMARY KEY, you specify the partition column first then each subsequent column in the

Re: I don't understand paging through a table by primary key.

2014-05-30 Thread Russell Bradberry
Then the data model you chose is incorrect.  As Rob Coli mentioned, you can not page through partitions that are ordered unless you are using an ordered partitioner.  Your only option is to store the data differently.  When using Cassandra you have to remember to “model your queries, not your

Re: Shouldn't cqlsh have an option for no formatting and no headers?

2014-05-30 Thread Russell Bradberry
cqlsh isn’t designed for dumping data. I think you want COPY  http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/copy_r.html On May 30, 2014 at 2:32:24 PM, Kevin Burton (bur...@spinn3r.com) wrote: I do this all the time with mysql… dump some database table to an output file so

Re: which replica has your data?

2014-04-22 Thread Russell Bradberry
nodetool getendpoints keyspace cf key On April 22, 2014 at 4:52:08 PM, Han,Meng (meng...@ufl.edu) wrote: Hi all, I have a data item whose row key is 7573657238353137303937323637363334393636363230 and I have a five node Cassandra cluster with replication factor set to 3. Each replica's

C* 1.2.15 Decommission issues

2014-04-10 Thread Russell Bradberry
We have about a 30 node cluster running the latest C* 1.2 series DSE.  One datacenter uses VNodes and the other datacenter has VNodes Disabled (because it is running DSE-Seearch) We have been replacing nodes in the VNode datacenter with faster ones and we have yet to have a successful

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Russell Bradberry
I would love to help with the REST interface, however my point was not to add REST into Cassandra.  My point was that if we had an abstract interface that even CQL used to access data, and this interface was made available for other drop in modules to access, then the project becomes extensible

Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Russell Bradberry
@Nate, @Tupshin, this is pretty close to what I had in mind. I would be open to helping out with a formal proposal. On March 12, 2014 at 12:11:41 PM, Tupshin Harper (tups...@tupshin.com) wrote: I agree that we are way off the initial topic, but I think we are spot on the most important

Re: mixed nodes, some SSD some HD

2014-03-05 Thread Russell Bradberry
Are you using the dynamic snitch? Because the SimpleSnitch is the default. On March 5, 2014 at 5:27:03 PM, Elliot Finley (efinley.li...@gmail.com) wrote: Keep in mind, for this 3 node cluster, N = 3. I did a bit more digging and I found this (for future searches on this topic):

Re: in AWS is it worth trying to talk to a server in the same zone as your client?

2014-02-12 Thread Russell Bradberry
Cross zone data transfer does not cost any extra money.  LOCAL_QUORUM = QUORUM if all 6 servers are located in the same logical datacenter.   Ensure your clients are connecting to either the local IP or the AWS hostname that is a CNAME to the local ip from within AWS.  If you connect to the

Re: in AWS is it worth trying to talk to a server in the same zone as your client?

2014-02-12 Thread Russell Bradberry
transfer is $0.01 / GB (see  http://aws.amazon.com/ec2/pricing/, Data Transfer section). On Wed, Feb 12, 2014 at 3:04 PM, Russell Bradberry rbradbe...@gmail.com wrote: Cross zone data transfer does not cost any extra money.  LOCAL_QUORUM = QUORUM if all 6 servers are located in the same logical

Re: non-vnodes own 0.0% of the ring on nodetool status

2014-02-12 Thread Russell Bradberry
This is normal as nodetool without specifying a keyspace outputs information for the ring as if it is SimpleStrategy with RF=1.  Try specifying a keyspace. On February 12, 2014 at 4:35:31 PM, Paulo Ricardo Motta Gomes (paulo.mo...@chaordicsystems.com) wrote: Hello, After adding a new

Re: CQL list command

2014-02-06 Thread Russell Bradberry
try SELECT * FROM my_table LIMIT 100; On February 6, 2014 at 4:02:26 PM, Andrew Cobley (a.e.cob...@dundee.ac.uk) wrote: TL;DR Is there a CQL equivalent of the CLI List command ? yes or No? Long version I often use the CLI command LIST for debugging or when teaching students showing

Re: vnode in production

2014-01-02 Thread Russell Bradberry
VNodes in production are pretty stable. That being said, I have never heard of anyone doing a successful nodetool shuffle”.  A few people have skirted the issue by creating a new data center with VNodes enabled and replicating the data over. On January 2, 2014 at 1:52:20 PM, Arindam Barua