Yeah, I was about to suggest the compaction strategy too. Leveled
compaction sounds like a better fit when records are being updated

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 8 October 2015 at 22:35, Tyler Hobbs <ty...@datastax.com> wrote:

> Upgrade to 2.2.2.  Your sstables are probably not compacting due to
> CASSANDRA-10270 <https://issues.apache.org/jira/browse/CASSANDRA-10270>,
> which was fixed in 2.2.2.
>
> Additionally, you may want to look into using leveled compaction (
> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction).
>
> On Thu, Oct 8, 2015 at 4:27 PM, Nazario Parsacala <dodongj...@gmail.com>
> wrote:
>
>>
>> Hi,
>>
>> so we are developing a system that computes profile of things that it
>> observes. The observation comes in form of events. Each thing that it
>> observe has an id and each thing has a set of subthings in it which has
>> measurement of some kind. Roughly there are about 500 subthings within each
>> thing. We receive events containing measurements of these 500 subthings
>> every 10 seconds or so.
>>
>> So as we receive events, we  read the old profile value, calculate the
>> new profile based on the new value and save it back. We use the following
>> schema to hold the profile.
>>
>> CREATE TABLE myprofile (
>>     id text,
>>     month text,
>>     day text,
>>     hour text,
>>     subthings text,
>>     lastvalue double,
>>     count int,
>>     stddev double,
>>  PRIMARY KEY ((id, month, day, hour), subthings)
>> ) WITH CLUSTERING ORDER BY (subthings ASC) );
>>
>>
>> This profile will then be use for certain analytics that can use in the
>> context of the ‘thing’ or in the context of specific thing and subthing.
>>
>> A profile can be defined as monthly, daily, hourly. So in case of monthly
>> the month will be set to the current month (i.e. ‘Oct’) and the day and
>> hour will be set to empty ‘’ string.
>>
>>
>> The problem that we have observed is that over time (actually in just a
>> matter of hours) we will see a huge degradation of query response  for the
>> monthly profile. At the start it will be respinding in 10-100 ms and after
>> a couple of hours it will go to 2000-3000 ms . If you leave it for a couple
>> of days you will start experiencing readtimeouts . The query is basically
>> just :
>>
>> select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and
>> hour=‘'
>>
>> This will have only about 500 rows or so.
>>
>>
>> I believe that this is cause by the fact there are multiple updates done
>> to this specific partition. So what do we think can be done to resolve this
>> ?
>>
>> BTW, I am using Cassandra 2.2.1 . And since this is a test , this is just
>> running on a single node.
>>
>>
>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Reply via email to