Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
Yes, I agree. I would say it cannot skip those cells because it doesn’t check the max timestamp of the cells of the sstable and therefore scans them one by one. Hannu > On 16 May 2017, at 19:48, Stefano Ortolani wrote: > > But it should skip those records since they are

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
But it should skip those records since they are sorted. My understanding would be something like: 1) read sstable 2 2) read the range tombstone 3) skip records from sstable2 and sstable1 within the range boundaries 4) read remaining records from sstable1 5) no records, return On Tue, May 16,

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
This is a bit of guessing but it probably reads sstables in some sort of sequence, so even if sstable 2 contains the tombstone, it still scans through the sstable 1 for possible data to be read. BR, Hannu > On 16 May 2017, at 19:40, Stefano Ortolani wrote: > > Little

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
Little update: also the following query timeouts, which is weird since the range tombstone should have been read by then... SELECT * FROM test_cql.test_cf WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf AND timeid < the_oldest_deleted_timeid ORDER BY timeid DESC; On Tue, May 16, 2017

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
Yes, that was my intention but I wanted to cross-check with the ML and the devs keeping an eye on it first. On Tue, May 16, 2017 at 5:10 PM, Hannu Kröger wrote: > Well, > > sstables contain some statistics about the cell timestamps and using that > information and the

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
Well, sstables contain some statistics about the cell timestamps and using that information and the tombstone timestamp it might be possible to skip some data but I’m not sure that Cassandra currently does that. Maybe it would be worth a JIRA ticket and see what the devs think about it. If

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Nitan Kainth
Thank you Stefano > On May 16, 2017, at 10:56 AM, Stefano Ortolani wrote: > > No, because C* has reverse iterators. > > On Tue, May 16, 2017 at 4:47 PM, Nitan Kainth > wrote: > If the data is stored in ASC order and query asks

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
No, because C* has reverse iterators. On Tue, May 16, 2017 at 4:47 PM, Nitan Kainth wrote: > If the data is stored in ASC order and query asks for DESC, then wouldn’t > it read whole partition in first and then pick data from reverse order? > > > On May 16, 2017, at 10:03 AM,

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Nitan Kainth
If the data is stored in ASC order and query asks for DESC, then wouldn’t it read whole partition in first and then pick data from reverse order? > On May 16, 2017, at 10:03 AM, Stefano Ortolani wrote: > > Hi Hannu, > > the piece of data in question is older. In my

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
Hi Hannu, the piece of data in question is older. In my example the tombstone is the newest piece of data. Since a range tombstone has information re the clustering key ranges, and the data is clustering key sorted, I would expect a linear scan not to be necessary. On Tue, May 16, 2017 at 3:46

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
Well, as mentioned, probably Cassandra doesn’t have logic and data to skip bigger regions of deleted data based on range tombstone. If some piece of data in a partition is newer than the tombstone, then it cannot be skipped. Therefore some partition level statistics of cell ages would need to

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
Hello, If you mean how to construct a query like that: you use ORDER BY clause with SELECT which is reverse to the default just like in the example below? If the table is constructed with "clustering order by (timeid ASC)” and you query “SELECT ... ORDER BY timeid DESC”, then the partition is

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
That is another way to see the question: are reverse iterators range tombstone aware? Yes. That is why I am puzzled by this afore-mentioned behavior. I would expect them to handle this case more gracefully. Cheers, Stefano On Tue, May 16, 2017 at 3:29 PM, Nitan Kainth wrote:

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Nitan Kainth
Hannu, How can you read a partition in reverse? Sent from my iPhone > On May 16, 2017, at 9:20 AM, Hannu Kröger wrote: > > Well, I’m guessing that Cassandra doesn't really know if the range tombstone > is useful for this or not. > > In many cases it might be that the

Re: Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Hannu Kröger
Well, I’m guessing that Cassandra doesn't really know if the range tombstone is useful for this or not. In many cases it might be that the partition contains data that is within the range of the tombstone but is newer than the tombstone and therefore it might be still be returned. Scanning

Range deletes, wide partitions, and reverse iterators

2017-05-16 Thread Stefano Ortolani
Hi all, I am seeing inconsistencies when mixing range tombstones, wide partitions, and reverse iterators. I still have to understand if the behaviour is to be expected hence the message on the mailing list. The situation is conceptually simple. I am using a table defined as follows: CREATE