Re: Why so many vnodes?

2013-06-10 Thread Milind Parikh
There are n vnodes regardless of the size of the physical cluster. Regards Milind On Jun 10, 2013 7:48 AM, Theo Hultberg t...@iconara.net wrote: Hi, The default number of vnodes is 256, is there any significance in this number? Since Cassandra's vnodes don't work like for example Riak's,

Re: ETL Tools to transfer data from Cassandra into other relational databases

2012-12-13 Thread Milind Parikh
Why would you use Cassandra for primary store of logging information? Have you considered Kafka ? You could , of course, then fan out the logs to both Cassandra (on a near real time basis ) and then on a daily basis (if you wish) extract the deltas from Kafka into a RDBMS; with no PIG/Hive etc.

RE: Cassandra Counters

2012-09-24 Thread Milind Parikh
IMO You would use Cassandra Counters (or other variation of distributed counting) in case of having determined that a centralized version of counting is not going to work. You'd determine the non_feasibility of centralized counting by figuring the speed at which you need to sustain writes and

Re: Data aggregation -- help me design a solution

2012-08-21 Thread Milind Parikh
1. Assuming that the majorirty of the line items are new and 2. The lookup of an existing line-item will dictate the performance of the system because reads are slower than writes in C*. 3. Assuming that you are using counters in C* Therefore eliminate that problem by implementing a bloom

Re: How to process new rows in parallel?

2012-08-03 Thread Milind Parikh
Kafka is relatively stable and has a active well-supported news-group as well. As discussed by Brian, you would be inverting the paradigm of store-process. Essentially in your original approach, you are storing the messages first and then processing them after the fact. In the Kafka model, you

Re: CounterColumns with double, min/max

2012-05-25 Thread Milind Parikh
On 1, countandra.org. On 2, the issue is a little more deep (we have investigated this at countandra). To approach it a little more comprehensively, the issue has more to do with events rather than counts (at least in IMO). A similar issue is about averages... countandra does sums and counts

Re: Flume and Cassandra

2012-02-22 Thread Milind Parikh
Coolwww.countandra.org calls them cascaded counters and it will be also based on Kafka. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Feb 22, 2012 7:22 PM, Edward Capriolo edlinuxg...@gmail.com

Re: Cassandra to Oracle?

2012-01-22 Thread Milind Parikh
The composite-key approach with counters would work very well in this case. It will also obviate the concern of not knowing the exact column names apriori...although for efficiencies, you might to look at maintaining a secondary cachelike cf for lookup Depending on your data patterns(not to

Re: Cassandra to Oracle?

2012-01-22 Thread Milind Parikh
My bad ~s/X:X-Value/Y:Y-Value/ after rereading the SELECT. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Jan 22, 2012 6:40 AM, Milind Parikh milindpar...@gmail.com wrote: The composite-key approach

Re: Data Model Question

2012-01-21 Thread Milind Parikh
I used rainbird as inspiration for Countandra ( some of publicly available data structures from rainbird preso). That said, there are significant differences between the two architectures. Additiomally as Cassandra begins to provide triggets, some very interesting things will become possible in

Re: How to store unique visitors in cassandra

2012-01-19 Thread Milind Parikh
You might want to look at the code in countandra.org; regardless of whether you use it. It use a model of dynamic composite keys (although static composite keys would have worked as well). For the actual query,only one row is hit. This of course only works bc the data model is attuned for the

Announcing Countandra 0.5

2012-01-10 Thread Milind Parikh
Inspired by twitter's rainbird project, Countandra is a hierarchical distributed counting engine at scale. It provides a complete http based interface to both posting events and getting queries. The syntax of a event posting is done in a FORMS compatible way. The result of the query is emitted in

Re: data agility

2011-11-20 Thread Milind Parikh
For 99% of current applications requiing a persistent datastore, Oracle, PgSQL and MySQL variants will suffice. For the 1% of the applications, consider C* if (a) you have given up on distributed transactions (ACIDLY; but NOT BASEICLY) (b) wondering about this new fangled

Re: Multi DC setup

2011-10-10 Thread Milind Parikh
Why have two rings? Cassandra manages the replication for youone ring with physical nodes in two dc might be a better option. Of course, depending on the inter-dc failure characteristics, might need to endure split-brain for a while. /*** sent from my android...please

Re: Queue suggestion in Cassandra

2011-09-16 Thread Milind Parikh
use zookeeper. Scott Fines has a great library on top of zk. On Fri, Sep 16, 2011 at 7:08 PM, Daning Wang dan...@netseer.com wrote: We try to implement an ordered queue system in Cassandra(ver 0.8.5). In initial design we use a row as queue, a column for each item in queue. that means

Re: Using Cassandra as a client data store

2011-08-18 Thread Milind Parikh
Why not use couchdb for this use case? Milind /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On Aug 18, 2011 9:07 PM, Nicholas Neuberger nneuberg...@gmail.com wrote: I've been using Cassandra as a

Predictable low RW latency, SLABS and STW GC

2011-07-22 Thread Milind Parikh
In order to be predicable @ big data scale, the intensity and periodicity of STW Garbage Collection has to be brought down. Assume that SLABS (Cass 2252) will be available in the main line at some time and assume that this will have the impact that other projects (hbase etc) are reporting. I

Re: one way to make counter delete work better

2011-06-14 Thread Milind Parikh
If I understand this correctly, then the epoch integer would be generated by each node. Since time always flows forward, the assumption would be, I suppose, that the epochs would be tagged with the node that generated them and additionally the counter would carry as much history as necessary (and

Re: rainbird question (why is the 1minute buffer needed?)

2011-05-22 Thread Milind Parikh
I believe that the key reason is souped up performance for most recent data. And yes, an intelligent flush leaves you vulnerable to some data loss. /*** sent from my android...please pardon occasional typos as I respond @ the speed of thought / On May

Re: Cassandra Vs. Oracle Coherence

2011-05-20 Thread Milind Parikh
Other interesting flavors in a distributed cache terracotta, gemfire.together with a complex event processing engine. like OCEP drives a lot of low latency, high freq trading where nano seconds matter /*** sent from my android...please pardon occasional typos

Re: nodes reference by hostname and not IP

2011-04-27 Thread Milind Parikh
Most likely because in the wild, you can't assume a reliable DNS. Just as an aside...This question comes up often in context of managing Cassandra clusters;especially in elastic situations. Most CMDBs assume a static name (host names/static IPs) for nodes. However this often proves to be

Re: IP address resolution in MultiDC setup (EC2)/VIP

2011-04-26 Thread Milind Parikh
At the risk of repeating the previous conclusions: (a) This configuration obviates the need for a patch that I had posted earlier. This is a good thing. (b) The reported latency(@Sasha) is less than ordinary latencies in EC2. The reasons behind this are not well understood. However I wouldn't

Re: Manual Conflict Resolution in Cassandra

2011-04-25 Thread Milind Parikh
@ the speed of thought / On Apr 25, 2011 3:54 AM, David Strauss da...@davidstrauss.net wrote: On Fri, 2011-04-22 at 13:31 -0700, Milind Parikh wrote: Is there a chance of getting manual confli... You can actually already perform manual conflict resolution in Cassandra

Re: IP address resolution in MultiDC setup

2011-04-25 Thread Milind Parikh
award both external and internal IP address for each node? or we have to explicitly buy the external IP's? I am looking into overlay n/w's. On Mon, Apr 25, 2011 at 5:20 PM, Milind Parikh milindpar...@gmail.com wrote: I stand correctedI show how cassandra can be deployed

Re: EC2 - 2 regions

2011-03-23 Thread Milind Parikh
need other ports for basic setup , right ? If anyone coud get 'nodetool repair' working with this patch (across regions), let me know. It may be I am doing something wrong. On Wed, Mar 23, 2011 at 1:08 AM, Milind Parikh milindpar...@gmail.com wrote: @aj are you sure...

Re: EC2 - 2 regions

2011-03-22 Thread Milind Parikh
@aj are you sure that all ports are accessible from all node? @sasha I think that being able to have the semantics of address aNAT address can emable security from different perspective. Describing an overlay nw will take long hete. But that may solve your security concerns over the internet.

Re: EC2 - 2 regions

2011-03-21 Thread Milind Parikh
code. Dave Viner On Mon, Mar 21, 2011 at 9:41 AM, A J s5a...@gmail.com wrote: Thanks for sharing the document, Milind ! Followed the instructions and it worked for me. On Mon, Mar 21, 2011 at 5:01 AM, Milind Parikh milindpar...@gmail.com wrote: Here's the document on Cassandra

Conflict resolution in Cassandra

2011-03-14 Thread Milind Parikh
https://docs.google.com/document/d/13Yc2t4d07290TdiRmSTchuAk9sbp4BeqOpqeYhbcDFM/edit?hl=en There was an excellent session on vector clocks and synchronous writes in cassandra. Here are my gleanings out of it. /*** sent from my android...please pardon occasional typos as I