Yeah, I was about to suggest the compaction strategy too. Leveled compaction sounds like a better fit when records are being updated
Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> On 8 October 2015 at 22:35, Tyler Hobbs <ty...@datastax.com> wrote: > Upgrade to 2.2.2. Your sstables are probably not compacting due to > CASSANDRA-10270 <https://issues.apache.org/jira/browse/CASSANDRA-10270>, > which was fixed in 2.2.2. > > Additionally, you may want to look into using leveled compaction ( > http://www.datastax.com/dev/blog/when-to-use-leveled-compaction). > > On Thu, Oct 8, 2015 at 4:27 PM, Nazario Parsacala <dodongj...@gmail.com> > wrote: > >> >> Hi, >> >> so we are developing a system that computes profile of things that it >> observes. The observation comes in form of events. Each thing that it >> observe has an id and each thing has a set of subthings in it which has >> measurement of some kind. Roughly there are about 500 subthings within each >> thing. We receive events containing measurements of these 500 subthings >> every 10 seconds or so. >> >> So as we receive events, we read the old profile value, calculate the >> new profile based on the new value and save it back. We use the following >> schema to hold the profile. >> >> CREATE TABLE myprofile ( >> id text, >> month text, >> day text, >> hour text, >> subthings text, >> lastvalue double, >> count int, >> stddev double, >> PRIMARY KEY ((id, month, day, hour), subthings) >> ) WITH CLUSTERING ORDER BY (subthings ASC) ); >> >> >> This profile will then be use for certain analytics that can use in the >> context of the ‘thing’ or in the context of specific thing and subthing. >> >> A profile can be defined as monthly, daily, hourly. So in case of monthly >> the month will be set to the current month (i.e. ‘Oct’) and the day and >> hour will be set to empty ‘’ string. >> >> >> The problem that we have observed is that over time (actually in just a >> matter of hours) we will see a huge degradation of query response for the >> monthly profile. At the start it will be respinding in 10-100 ms and after >> a couple of hours it will go to 2000-3000 ms . If you leave it for a couple >> of days you will start experiencing readtimeouts . The query is basically >> just : >> >> select * from myprofile where id=‘1’ and month=‘Oct’ and day=‘’ and >> hour=‘' >> >> This will have only about 500 rows or so. >> >> >> I believe that this is cause by the fact there are multiple updates done >> to this specific partition. So what do we think can be done to resolve this >> ? >> >> BTW, I am using Cassandra 2.2.1 . And since this is a test , this is just >> running on a single node. >> >> >> >> >> > > > -- > Tyler Hobbs > DataStax <http://datastax.com/> >