Re: Order by for aggregated values

2017-06-06 Thread Nate McCall
> > > My application is a real-time application. It monitors devices in the > network and displays the top N devices for various parameters averaged over > a time period. A query may involve anywhere from 10 to 50k devices, and > anywhere from 5 to 2000 intervals. We expect a query to take less

Reg:- Multi DC Configuration

2017-06-06 Thread @Nandan@
Hi , I am trying to Setup Cassandra 3.9 on Multi DC. Currently, I am having 2 DCs with 3 and 2 nodes respectively. DC1 Name :- India Nodes :- 192.16.0.1 , 192.16.0.2, 192.16.0.3 DC2 Name :- USA Nodes :- 172.16.0.1 , 172.16.0.2 Please help me to know which files I need to make changes for

Re: Order by for aggregated values

2017-06-06 Thread DuyHai Doan
First Group By is only allowed on partition keys and clustering columns, not on arbitrary column. The internal implementation of group by tries to fetch data on clustering order to avoid having to "re-sort" them in memory which would be very expensive Second, group by works best when restricted

Re: Partition range incremental repairs

2017-06-06 Thread Chris Stokesmore
Hi all, Wondering if anyone had any thoughts on this? At the moment the long running repairs cause us to be running them on two nodes at once for a bit of time, which obivould increases the cluster load. On 2017-05-25 16:18 (+0100), Chris Stokesmore wrote: > Hi,> > >

Re: Regular dropped READ messages

2017-06-06 Thread Vincent Rischmann
Hi Alexander. Yeah, the minor GCs I see are usually around 300ms but sometimes jumping to 1s or even more. Hardware specs are: - 8 core CPUs - 32 GB of RAM - 4 SSDs in hardware Raid 0, around 3TB of space per node GC settings:-Xmx12G -Xms12G -XX:+UseG1GC -

Re: Regular dropped READ messages

2017-06-06 Thread Alexander Dejanovski
Hi Vincent, it is very clear, thanks for all the info. I would not stick with G1 in your case, as it requires much more heap to perform correctly (>24GB). CMS/ParNew should be much more efficient here and I would go with some settings I usually apply on big workloads : 16GB heap / 6GB new gen /

Re: Partition range incremental repairs

2017-06-06 Thread Anuj Wadehra
Hi Chris, Using pr with incremental repairs does not make sense. Primary range repair is an optimization over full repair. If you run full repair on a n node cluster with RF=3, you would be repairing each data thrice. E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C .

Re: Regular dropped READ messages

2017-06-06 Thread Vincent Rischmann
Thanks Alexander for the help, lots of good info in there. I'll try to switch back to CMS and see how it fares. On Tue, Jun 6, 2017, at 05:06 PM, Alexander Dejanovski wrote: > Hi Vincent, > > it is very clear, thanks for all the info. > > I would not stick with G1 in your case, as it requires

RE: Order by for aggregated values

2017-06-06 Thread Roger Fischer (CW)
Hi DuyHai, this is in response to the other points in your response. My application is a real-time application. It monitors devices in the network and displays the top N devices for various parameters averaged over a time period. A query may involve anywhere from 10 to 50k devices, and

Re: Order by for aggregated values

2017-06-06 Thread Jeff Jirsa
On 2017-06-05 19:00 (-0700), "Roger Fischer (CW)" wrote: > Hello, > > is there any intent to support "order by" and "limit" on aggregated values? > > For time series data, top n queries are quite common. Group-by was the first > step towards supporting such queries, but

Local_serial >> Adding nodes

2017-06-06 Thread vasu gunja
Hi All, We are having 2 DC setup each consists of 20 odd nodes and recently we decided to add 6 more nodes to DC1. We are using LWT's, application dirvers are configuared to use LOCAL_SERIAL. As we are adding multiple nodes at a time we used option "-Dcassandra.consistent.rangemovement=false"

Regular dropped READ messages

2017-06-06 Thread Vincent Rischmann
Hi, we have a cluster of 11 nodes running Cassandra 2.2.9 where we regularly get READ messages dropped: > READ messages were dropped in last 5000 ms: 974 for internal timeout > and 0 for cross node timeout Looking at the logs, some are logged at the same time as Old Gen GCs. These GCs all take

Re: Regular dropped READ messages

2017-06-06 Thread Alexander Dejanovski
Hi Vincent, dropped messages are indeed common in case of long GC pauses. Having 4s to 6s pauses is not normal and is the sign of an unhealthy cluster. Minor GCs are usually faster but you can have long ones too. If you can share your hardware specs along with your current GC settings (CMS or

Re: Partition range incremental repairs

2017-06-06 Thread Chris Stokesmore
Thank you for the excellent and clear description of the different versions of repair Anuj, that has cleared up what I expect to be happening. The problem now is in our cluster, we are running repairs with options (parallelism: parallel, primary range: false, incremental: true, job threads: 1,

RE: Order by for aggregated values

2017-06-06 Thread Roger Fischer (CW)
Hi DuyHai, thanks for your response. I understand the reservations about implementing sorting in Cassandra. But I think it is analogous to filtering. It may be bad in the general case, but can be useful for particular use cases. If Cassandra does not provide “order-by”, then the ordering has

Re: Partition range incremental repairs

2017-06-06 Thread Anuj Wadehra
Hi Chris, Can your share following info: 1. Exact repair commands you use for inc repair and pr repair 2. Repair time should be measured at cluster level for inc repair. So, whats the total time it takes to run repair on all nodes for incremental vs pr repairs? 3. You are repairing one dc DC3.

Re: Order by for aggregated values

2017-06-06 Thread Jonathan Haddad
Unfortunately this feature falls in a category of *incredibly useful* features that have gotten the -1 over the years because it doesn't scale like we want it to. As far as basic aggregations go, it's remarkably trivial to roll up 100K-1MM items using very little memory, so at first it seems like

Re: Understanding the limitation to only one non-PK column in MV-PK

2017-06-06 Thread DuyHai Doan
All the explanation for why just 1 non PK column can be used as PK for MV is here: https://skillsmatter.com/skillscasts/7446-cassandra-udf-and-materialised-views-in-depth Skip to 19:18 for the explanation On Mon, May 8, 2017 at 8:08 PM, Fridtjof Sander < fridtjof.san...@googlemail.com> wrote:

Re: Partition range incremental repairs

2017-06-06 Thread Jonathan Haddad
I can't recommend *anyone* use incremental repair as there's some pretty horrible bugs in it that can cause Merkle trees to wildly mismatch & result in massive overstreaming. Check out https://issues.apache.org/jira/browse/CASSANDRA-9143. TL;DR: Do not use incremental repair before 4.0. On Tue,

Re: Order by for aggregated values

2017-06-06 Thread DuyHai Doan
The problem is not that it's not feasible from Cassandra side, it is The problem is when doing arbitrary ORDER BY, Cassandra needs to resort to in-memory sorting of a potentially huge amout of data --> more pressure on heap --> impact on cluster stability Whereas delegating this kind of job to