RE: [Multi DC] Old Data Not syncing from Existing cluster to new Cluster

2017-01-29 Thread Abhishek Kumar Maheshwari
But how I will tell rebuild command source DC if I have more than 2 Dc?

@dinking, yes I run the command, and it did some strange thing now:

Datacenter: DRPOCcluster

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
UN  172.29.XX.XXX  140.16 GB  256  ?   
badf985b-37da-4735-b468-8d3a058d4b60  01
UN  172.29. XX.XXX  82.04 GB   256  ?   
317061b2-c19f-44ba-a776-bcd91c70bbdd  03
UN  172.29. XX.XXX  85.29 GB   256  ?   
9bf0d1dc-6826-4f3b-9c56-cec0c9ce3b6c  02
Datacenter: dc_india

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
UN  172.26. XX.XXX   79.09 GB   256  ?   
3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
UN  172.26. XX.XXX   79.39 GB   256  ?   
7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2



In source DC (dc_india) we have near about 79 GB data. But in new DC each node 
has more than 79 GB data and Seed IP have near about 2 times data. Below is 
replication:
Data Key Space:
alter KEYSPACE wls WITH replication = {'class': 'NetworkTopologyStrategy', 
'DRPOCcluster': '3','dc_india':'2'}  AND durable_writes = true;
alter KEYSPACE adlog WITH replication = {'class': 'NetworkTopologyStrategy', 
'DRPOCcluster': '3','dc_india':'2'}  AND durable_writes = true;

New DC('DRPOCcluster') system Key Space:

alter KEYSPACE system_distributed WITH replication = {'class': 
'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND 
durable_writes = true;
alter KEYSPACE system_auth WITH replication = {'class': 
'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND 
durable_writes = true;
alter KEYSPACE system_traces WITH replication = {'class': 
'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND 
durable_writes = true;
alter KEYSPACE "OpsCenter" WITH replication = {'class': 
'NetworkTopologyStrategy', 'DRPOCcluster': '3','dc_india':'0'}  AND 
durable_writes = true;

Old  DC(‘dc_india’) system Key Space:

alter KEYSPACE system_distributed WITH replication = {'class': 
'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND 
durable_writes = true;
alter KEYSPACE system_auth WITH replication = {'class': 
'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND 
durable_writes = true;
alter KEYSPACE system_traces WITH replication = {'class': 
'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND 
durable_writes = true;
alter KEYSPACE "OpsCenter" WITH replication = {'class': 
'NetworkTopologyStrategy', 'DRPOCcluster': '0','dc_india':'2'}  AND 
durable_writes = true;

why this happening? I did soething wrong?

Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA
P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: kurt greaves [mailto:k...@instaclustr.com]
Sent: Saturday, January 28, 2017 3:27 AM
To: user@cassandra.apache.org
Subject: Re: [Multi DC] Old Data Not syncing from Existing cluster to new 
Cluster

What Dikang said, in your original email you are passing -dc to rebuild. This 
is incorrect. Simply run nodetool rebuild  from each of the nodes in 
the new dc.

On 28 Jan 2017 07:50, "Dikang Gu" 
> wrote:
Have you run 'nodetool rebuild dc_india' on the new nodes?

On Tue, Jan 24, 2017 at 7:51 AM, Benjamin Roth 
> wrote:
Have you also altered RF of system_distributed as stated in the tutorial?

2017-01-24 16:45 GMT+01:00 Abhishek Kumar Maheshwari 
>:
My Mistake,

Both clusters are up and running.

Datacenter: DRPOCcluster

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
UN  172.29.XX.XX  1.65 GB   256  ?   
badf985b-37da-4735-b468-8d3a058d4b60  01
UN  172.29.XX.XX  1.64 GB   256  ?   
317061b2-c19f-44ba-a776-bcd91c70bbdd  03
UN  172.29.XX.XX  1.64 GB   256  ?   
9bf0d1dc-6826-4f3b-9c56-cec0c9ce3b6c  02
Datacenter: dc_india

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   OwnsHost ID  
 Rack
UN  172.26.XX.XX   79.90 GB   256  ?   
3e8133ed-98b5-418d-96b5-690a1450cd30  RACK1
UN  172.26.XX.XX   80.21 GB   256  ?   
7d3f5b25-88f9-4be7-b0f5-746619153543  RACK2

Thanks & Regards,
Abhishek Kumar Maheshwari
+91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company
FC 

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
In theory, you're right and Cassandra should possibly skip reading cells
having time < 50. But it's all theory, in practice Cassandra read chunks of
xxx kilobytes worth of data (don't remember the exact value of xxx, maybe
64k or far less) so you may end up reading tombstones.

On Sun, Jan 29, 2017 at 9:24 PM, John Sanda  wrote:

> Thanks for the clarification. Let's say I have a partition in an SSTable
> where the values of time range from 100 to 10 and everything < 50 is
> expired. If I do a query with time < 100 and time >= 50, are there
> scenarios in which Cassandra will have to read cells where time < 50? In
> particular I am wondering if compression might have any affect.
>
> On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan  wrote:
>
>> "Should the data be sorted by my time column regardless of the
>> compaction strategy" --> It does
>>
>> What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
>> compacted together with a new chunk of SSTABLE-2 containing fresh data so
>> in the new resulting SSTable will contain tombstones AND fresh data inside
>> the same partition, but of course sorted by clustering column "time".
>>
>> On Sun, Jan 29, 2017 at 8:55 PM, John Sanda  wrote:
>>
>> Since STCS does not sort data based on timestamp, your wide partition may
>> span over multiple SSTables and inside each SSTable, old data (+
>> tombstones) may sit on the same partition as newer data.
>>
>>
>> Should the data be sorted by my time column regardless of the compaction
>> strategy? I didn't think that the column timestamp came into play with
>> respect to sorting. I have been able to review some SSTables with
>> sstablemetadata and I can see that old/expired data is definitely living
>> with live data.
>>
>>
>> On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan 
>> wrote:
>>
>> Ok so give it a try with TWCS. Since STCS does not sort data based on
>> timestamp, your wide partition may span over multiple SSTables and inside
>> each SSTable, old data (+ tombstones) may sit on the same partition as
>> newer data.
>>
>> When reading by slice, even if you request for fresh data, Cassandra has
>> to scan over a lot tombstones to fetch the correct range of data thus your
>> issue
>>
>> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda  wrote:
>>
>> It was with STCS. It was on a 2.x version before TWCS was available.
>>
>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan 
>> wrote:
>>
>> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
>>
>> If you're using DTCS, beware of its weird behavior and tricky
>> configuration.
>>
>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda  wrote:
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>>
>> As I mentioned previously, the UI only queries recent data, e.g., the
>> past hour, past two hours, past day, past week. The UI does not query for
>> anything older than the TTL which is 7 days. My understanding and
>> expectation was that Cassandra would only scan live cells. The UI is a
>> separate application that I do not maintain, so I am not 100% certain about
>> the queries. I have been told that it does not query for anything older
>> than 7 days.
>>
>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
>> wrote:
>>
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>>
>>
>>
>>
>> --
>>
>> - John
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> - John
>>
>>
>>
>>
>>
>>
>>
>>


Re: Time series data model and tombstones

2017-01-29 Thread Jonathan Haddad
Check out our post on how to use TWCS before 3.0.

http://thelastpickle.com/blog/2017/01/10/twcs-part2.html

On Sun, Jan 29, 2017 at 11:20 AM John Sanda  wrote:

> It was with STCS. It was on a 2.x version before TWCS was available.
>
> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan  wrote:
>
> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
>
> If you're using DTCS, beware of its weird behavior and tricky
> configuration.
>
> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda  wrote:
>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit tombstones in
> other SSTables that contain the same "id". As mentioned previously, you
> need to add a time bucket to your partitioning key and definitely use
> DTCS/TWCS.
>
>
> As I mentioned previously, the UI only queries recent data, e.g., the past
> hour, past two hours, past day, past week. The UI does not query for
> anything older than the TTL which is 7 days. My understanding and
> expectation was that Cassandra would only scan live cells. The UI is a
> separate application that I do not maintain, so I am not 100% certain about
> the queries. I have been told that it does not query for anything older
> than 7 days.
>
> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
> wrote:
>
>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit tombstones in
> other SSTables that contain the same "id". As mentioned previously, you
> need to add a time bucket to your partitioning key and definitely use
> DTCS/TWCS.
>
>
>
>
>
> --
>
> - John
>
>
>
>
>
>
>
>


Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
Thanks for the clarification. Let's say I have a partition in an SSTable
where the values of time range from 100 to 10 and everything < 50 is
expired. If I do a query with time < 100 and time >= 50, are there
scenarios in which Cassandra will have to read cells where time < 50? In
particular I am wondering if compression might have any affect.

On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan  wrote:

> "Should the data be sorted by my time column regardless of the compaction
> strategy" --> It does
>
> What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
> compacted together with a new chunk of SSTABLE-2 containing fresh data so
> in the new resulting SSTable will contain tombstones AND fresh data inside
> the same partition, but of course sorted by clustering column "time".
>
> On Sun, Jan 29, 2017 at 8:55 PM, John Sanda  wrote:
>
> Since STCS does not sort data based on timestamp, your wide partition may
> span over multiple SSTables and inside each SSTable, old data (+
> tombstones) may sit on the same partition as newer data.
>
>
> Should the data be sorted by my time column regardless of the compaction
> strategy? I didn't think that the column timestamp came into play with
> respect to sorting. I have been able to review some SSTables with
> sstablemetadata and I can see that old/expired data is definitely living
> with live data.
>
>
> On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan  wrote:
>
> Ok so give it a try with TWCS. Since STCS does not sort data based on
> timestamp, your wide partition may span over multiple SSTables and inside
> each SSTable, old data (+ tombstones) may sit on the same partition as
> newer data.
>
> When reading by slice, even if you request for fresh data, Cassandra has
> to scan over a lot tombstones to fetch the correct range of data thus your
> issue
>
> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda  wrote:
>
> It was with STCS. It was on a 2.x version before TWCS was available.
>
> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan  wrote:
>
> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
>
> If you're using DTCS, beware of its weird behavior and tricky
> configuration.
>
> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda  wrote:
>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit tombstones in
> other SSTables that contain the same "id". As mentioned previously, you
> need to add a time bucket to your partitioning key and definitely use
> DTCS/TWCS.
>
>
> As I mentioned previously, the UI only queries recent data, e.g., the past
> hour, past two hours, past day, past week. The UI does not query for
> anything older than the TTL which is 7 days. My understanding and
> expectation was that Cassandra would only scan live cells. The UI is a
> separate application that I do not maintain, so I am not 100% certain about
> the queries. I have been told that it does not query for anything older
> than 7 days.
>
> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
> wrote:
>
>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit tombstones in
> other SSTables that contain the same "id". As mentioned previously, you
> need to add a time bucket to your partitioning key and definitely use
> DTCS/TWCS.
>
>
>
>
>
> --
>
> - John
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> - John
>
>
>
>
>
>
>
>


Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
"Should the data be sorted by my time column regardless of the compaction
strategy" --> It does

What I mean is that an old "chunk" of expired data in SSTABLE-12 may be
compacted together with a new chunk of SSTABLE-2 containing fresh data so
in the new resulting SSTable will contain tombstones AND fresh data inside
the same partition, but of course sorted by clustering column "time".

On Sun, Jan 29, 2017 at 8:55 PM, John Sanda  wrote:

> Since STCS does not sort data based on timestamp, your wide partition may
>> span over multiple SSTables and inside each SSTable, old data (+
>> tombstones) may sit on the same partition as newer data.
>
>
> Should the data be sorted by my time column regardless of the compaction
> strategy? I didn't think that the column timestamp came into play with
> respect to sorting. I have been able to review some SSTables with
> sstablemetadata and I can see that old/expired data is definitely living
> with live data.
>
>
> On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan  wrote:
>
>> Ok so give it a try with TWCS. Since STCS does not sort data based on
>> timestamp, your wide partition may span over multiple SSTables and inside
>> each SSTable, old data (+ tombstones) may sit on the same partition as
>> newer data.
>>
>> When reading by slice, even if you request for fresh data, Cassandra has
>> to scan over a lot tombstones to fetch the correct range of data thus your
>> issue
>>
>> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda  wrote:
>>
>>> It was with STCS. It was on a 2.x version before TWCS was available.
>>>
>>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan 
>>> wrote:
>>>
 Did you get this Overwhelming tombstonne behavior with STCS or with
 TWCS ?

 If you're using DTCS, beware of its weird behavior and tricky
 configuration.

 On Sun, Jan 29, 2017 at 3:52 PM, John Sanda 
 wrote:

 Your partitioning key is text. If you have multiple entries per id you
 are likely hitting older cells that have expired. Descending only affects
 how the data is stored on disk, if you have to read the whole partition to
 find whichever time you are querying for you could potentially hit
 tombstones in other SSTables that contain the same "id". As mentioned
 previously, you need to add a time bucket to your partitioning key and
 definitely use DTCS/TWCS.


 As I mentioned previously, the UI only queries recent data, e.g., the
 past hour, past two hours, past day, past week. The UI does not query for
 anything older than the TTL which is 7 days. My understanding and
 expectation was that Cassandra would only scan live cells. The UI is a
 separate application that I do not maintain, so I am not 100% certain about
 the queries. I have been told that it does not query for anything older
 than 7 days.

 On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
 wrote:


 Your partitioning key is text. If you have multiple entries per id you
 are likely hitting older cells that have expired. Descending only affects
 how the data is stored on disk, if you have to read the whole partition to
 find whichever time you are querying for you could potentially hit
 tombstones in other SSTables that contain the same "id". As mentioned
 previously, you need to add a time bucket to your partitioning key and
 definitely use DTCS/TWCS.





 --

 - John








>>
>
>
> --
>
> - John
>


Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
>
> Since STCS does not sort data based on timestamp, your wide partition may
> span over multiple SSTables and inside each SSTable, old data (+
> tombstones) may sit on the same partition as newer data.


Should the data be sorted by my time column regardless of the compaction
strategy? I didn't think that the column timestamp came into play with
respect to sorting. I have been able to review some SSTables with
sstablemetadata and I can see that old/expired data is definitely living
with live data.


On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan  wrote:

> Ok so give it a try with TWCS. Since STCS does not sort data based on
> timestamp, your wide partition may span over multiple SSTables and inside
> each SSTable, old data (+ tombstones) may sit on the same partition as
> newer data.
>
> When reading by slice, even if you request for fresh data, Cassandra has
> to scan over a lot tombstones to fetch the correct range of data thus your
> issue
>
> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda  wrote:
>
>> It was with STCS. It was on a 2.x version before TWCS was available.
>>
>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan 
>> wrote:
>>
>>> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS
>>> ?
>>>
>>> If you're using DTCS, beware of its weird behavior and tricky
>>> configuration.
>>>
>>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda 
>>> wrote:
>>>
>>> Your partitioning key is text. If you have multiple entries per id you
>>> are likely hitting older cells that have expired. Descending only affects
>>> how the data is stored on disk, if you have to read the whole partition to
>>> find whichever time you are querying for you could potentially hit
>>> tombstones in other SSTables that contain the same "id". As mentioned
>>> previously, you need to add a time bucket to your partitioning key and
>>> definitely use DTCS/TWCS.
>>>
>>>
>>> As I mentioned previously, the UI only queries recent data, e.g., the
>>> past hour, past two hours, past day, past week. The UI does not query for
>>> anything older than the TTL which is 7 days. My understanding and
>>> expectation was that Cassandra would only scan live cells. The UI is a
>>> separate application that I do not maintain, so I am not 100% certain about
>>> the queries. I have been told that it does not query for anything older
>>> than 7 days.
>>>
>>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
>>> wrote:
>>>
>>>
>>> Your partitioning key is text. If you have multiple entries per id you
>>> are likely hitting older cells that have expired. Descending only affects
>>> how the data is stored on disk, if you have to read the whole partition to
>>> find whichever time you are querying for you could potentially hit
>>> tombstones in other SSTables that contain the same "id". As mentioned
>>> previously, you need to add a time bucket to your partitioning key and
>>> definitely use DTCS/TWCS.
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> - John
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>


-- 

- John


Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
Ok so give it a try with TWCS. Since STCS does not sort data based on
timestamp, your wide partition may span over multiple SSTables and inside
each SSTable, old data (+ tombstones) may sit on the same partition as
newer data.

When reading by slice, even if you request for fresh data, Cassandra has to
scan over a lot tombstones to fetch the correct range of data thus your
issue

On Sun, Jan 29, 2017 at 8:19 PM, John Sanda  wrote:

> It was with STCS. It was on a 2.x version before TWCS was available.
>
> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan  wrote:
>
>> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
>>
>> If you're using DTCS, beware of its weird behavior and tricky
>> configuration.
>>
>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda  wrote:
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>>
>> As I mentioned previously, the UI only queries recent data, e.g., the
>> past hour, past two hours, past day, past week. The UI does not query for
>> anything older than the TTL which is 7 days. My understanding and
>> expectation was that Cassandra would only scan live cells. The UI is a
>> separate application that I do not maintain, so I am not 100% certain about
>> the queries. I have been told that it does not query for anything older
>> than 7 days.
>>
>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
>> wrote:
>>
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>>
>>
>>
>>
>> --
>>
>> - John
>>
>>
>>
>>
>>
>>
>>
>>


Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
It was with STCS. It was on a 2.x version before TWCS was available.

On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan  wrote:

> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?
>
> If you're using DTCS, beware of its weird behavior and tricky
> configuration.
>
> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda  wrote:
>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit tombstones in
> other SSTables that contain the same "id". As mentioned previously, you
> need to add a time bucket to your partitioning key and definitely use
> DTCS/TWCS.
>
>
> As I mentioned previously, the UI only queries recent data, e.g., the past
> hour, past two hours, past day, past week. The UI does not query for
> anything older than the TTL which is 7 days. My understanding and
> expectation was that Cassandra would only scan live cells. The UI is a
> separate application that I do not maintain, so I am not 100% certain about
> the queries. I have been told that it does not query for anything older
> than 7 days.
>
> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
> wrote:
>
>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit tombstones in
> other SSTables that contain the same "id". As mentioned previously, you
> need to add a time bucket to your partitioning key and definitely use
> DTCS/TWCS.
>
>
>
>
>
> --
>
> - John
>
>
>
>
>
>
>
>


Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ?

If you're using DTCS, beware of its weird behavior and tricky configuration.

On Sun, Jan 29, 2017 at 3:52 PM, John Sanda  wrote:

> Your partitioning key is text. If you have multiple entries per id you are
>> likely hitting older cells that have expired. Descending only affects how
>> the data is stored on disk, if you have to read the whole partition to find
>> whichever time you are querying for you could potentially hit tombstones in
>> other SSTables that contain the same "id". As mentioned previously, you
>> need to add a time bucket to your partitioning key and definitely use
>> DTCS/TWCS.
>
>
> As I mentioned previously, the UI only queries recent data, e.g., the past
> hour, past two hours, past day, past week. The UI does not query for
> anything older than the TTL which is 7 days. My understanding and
> expectation was that Cassandra would only scan live cells. The UI is a
> separate application that I do not maintain, so I am not 100% certain about
> the queries. I have been told that it does not query for anything older
> than 7 days.
>
> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves 
> wrote:
>
>>
>> Your partitioning key is text. If you have multiple entries per id you
>> are likely hitting older cells that have expired. Descending only affects
>> how the data is stored on disk, if you have to read the whole partition to
>> find whichever time you are querying for you could potentially hit
>> tombstones in other SSTables that contain the same "id". As mentioned
>> previously, you need to add a time bucket to your partitioning key and
>> definitely use DTCS/TWCS.
>>
>
>
>
> --
>
> - John
>


Re: Time series data model and tombstones

2017-01-29 Thread John Sanda
>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit tombstones in
> other SSTables that contain the same "id". As mentioned previously, you
> need to add a time bucket to your partitioning key and definitely use
> DTCS/TWCS.


As I mentioned previously, the UI only queries recent data, e.g., the past
hour, past two hours, past day, past week. The UI does not query for
anything older than the TTL which is 7 days. My understanding and
expectation was that Cassandra would only scan live cells. The UI is a
separate application that I do not maintain, so I am not 100% certain about
the queries. I have been told that it does not query for anything older
than 7 days.

On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves  wrote:

>
> Your partitioning key is text. If you have multiple entries per id you are
> likely hitting older cells that have expired. Descending only affects how
> the data is stored on disk, if you have to read the whole partition to find
> whichever time you are querying for you could potentially hit tombstones in
> other SSTables that contain the same "id". As mentioned previously, you
> need to add a time bucket to your partitioning key and definitely use
> DTCS/TWCS.
>



-- 

- John


Re: Time series data model and tombstones

2017-01-29 Thread kurt greaves
Your partitioning key is text. If you have multiple entries per id you are
likely hitting older cells that have expired. Descending only affects how
the data is stored on disk, if you have to read the whole partition to find
whichever time you are querying for you could potentially hit tombstones in
other SSTables that contain the same "id". As mentioned previously, you
need to add a time bucket to your partitioning key and definitely use
DTCS/TWCS.