"Should the data be sorted by my time column regardless of the compaction
strategy" --> It does

What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
compacted together with a new chunk of SSTABLE-2 containing fresh data so
in the new resulting SSTable will contain tombstones AND fresh data inside
the same partition, but of course sorted by clustering column "time".

On Sun, Jan 29, 2017 at 8:55 PM, John Sanda <john.sa...@gmail.com> wrote:

> Since STCS does not sort data based on timestamp, your wide partition may
>> span over multiple SSTables and inside each SSTable, old data (+
>> tombstones) may sit on the same partition as newer data.
>
>
> Should the data be sorted by my time column regardless of the compaction
> strategy? I didn't think that the column timestamp came into play with
> respect to sorting. I have been able to review some SSTables with
> sstablemetadata and I can see that old/expired data is definitely living
> with live data.
>
>
> On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan <doanduy...@gmail.com> wrote:
>
>> Ok so give it a try with TWCS. Since STCS does not sort data based on
>> timestamp, your wide partition may span over multiple SSTables and inside
>> each SSTable, old data (+ tombstones) may sit on the same partition as
>> newer data.
>>
>> When reading by slice, even if you request for fresh data, Cassandra has
>> to scan over a lot tombstones to fetch the correct range of data thus your
>> issue
>>
>> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com> wrote:
>>
>>> It was with STCS. It was on a 2.x version before TWCS was available.
>>>
>>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan <doanduy...@gmail.com>
>>> wrote:
>>>
>>>> Did you get this Overwhelming tombstonne behavior with STCS or with
>>>> TWCS ?
>>>>
>>>> If you're using DTCS, beware of its weird behavior and tricky
>>>> configuration.
>>>>
>>>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda <john.sa...@gmail.com>
>>>> wrote:
>>>>
>>>> Your partitioning key is text. If you have multiple entries per id you
>>>> are likely hitting older cells that have expired. Descending only affects
>>>> how the data is stored on disk, if you have to read the whole partition to
>>>> find whichever time you are querying for you could potentially hit
>>>> tombstones in other SSTables that contain the same "id". As mentioned
>>>> previously, you need to add a time bucket to your partitioning key and
>>>> definitely use DTCS/TWCS.
>>>>
>>>>
>>>> As I mentioned previously, the UI only queries recent data, e.g., the
>>>> past hour, past two hours, past day, past week. The UI does not query for
>>>> anything older than the TTL which is 7 days. My understanding and
>>>> expectation was that Cassandra would only scan live cells. The UI is a
>>>> separate application that I do not maintain, so I am not 100% certain about
>>>> the queries. I have been told that it does not query for anything older
>>>> than 7 days.
>>>>
>>>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves <k...@instaclustr.com>
>>>> wrote:
>>>>
>>>>
>>>> Your partitioning key is text. If you have multiple entries per id you
>>>> are likely hitting older cells that have expired. Descending only affects
>>>> how the data is stored on disk, if you have to read the whole partition to
>>>> find whichever time you are querying for you could potentially hit
>>>> tombstones in other SSTables that contain the same "id". As mentioned
>>>> previously, you need to add a time bucket to your partitioning key and
>>>> definitely use DTCS/TWCS.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> - John
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>
>
> --
>
> - John
>

Reply via email to