Re: question on maximum disk seeks

preetika tyagi Tue, 21 Mar 2017 11:14:32 -0700

Oh I see. I understand it now. Thank you for the clarification!

Preetika


On Tue, Mar 21, 2017 at 11:07 AM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> Each sstable has it's own partition index, therefore it's never updated.
>
> On Tue, Mar 21, 2017 at 11:04 AM preetika tyagi <preetikaty...@gmail.com>
> wrote:
>
>> Yes, I understand that. However, what I'm trying to understand is the
>> internal structure of partition index. When a record associate with the
>> same partition key is updated, we have two different records with different
>> timestamps. There are chances of these two records being split across two
>> different SSTables (of course as long as compaction is not merging them
>> into one SSTable eventually). How partition index looks like in such case?
>> For the same key, we have two different records in different SSTables. How
>> does partition index store such information? Can it have repeated partition
>> keys with different disk offsets pointing to different SSTables?
>>
>> On Tue, Mar 21, 2017 at 10:09 AM, Jonathan Haddad <j...@jonhaddad.com>
>> wrote:
>>
>> The partition index is never updated, as sstables are immutable.
>>
>> On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi <preetikaty...@gmail.com>
>> wrote:
>>
>> Thank you Jan & Jeff for the responses. That was really useful.
>>
>> Jan - I have one follow-up question. When the data is spread over more
>> than one SSTable in case of updates as you mentioned, we will need two
>> seeks per SSTable (one for partition index and another for SSTable itself).
>> I'm curious to know how partition index is structured internally. I was
>> assuming it to be a table with <key, disk offset> pairs. In case of an
>> update to the same key for several times, how it is recorded in the
>> partition index?
>>
>> Thanks,
>> Preetika
>>
>> On Mon, Mar 20, 2017 at 10:37 PM, <j.kes...@enercast.de> wrote:
>>
>> Hi,
>>
>>
>>
>> youre right – one seek with hit in the partition key cache and two if not.
>>
>>
>>
>> Thats the theory – but two thinge to mention:
>>
>>
>>
>> First, you need two seeks per sstable not per entire read. So if you data
>> is spread over multiple sstables on disk you obviously need more then two
>> reads. Think of often updated partition keys – in combination with memory
>> preassure you can easily end up with maaany sstables (ok they will be
>> compacted some time in the future).
>>
>>
>>
>> Second, there could be fragmentation on disk which leads to seeks during
>> sequential reads.
>>
>>
>>
>> Jan
>>
>>
>>
>> Gesendet von meinem Windows 10 Phone
>>
>>
>>
>> *Von: *preetika tyagi <preetikaty...@gmail.com>
>> *Gesendet: *Montag, 20. März 2017 21:18
>> *An: *user@cassandra.apache.org
>> *Betreff: *question on maximum disk seeks
>>
>>
>>
>> I'm trying to understand the maximum number of disk seeks required in a
>> read operation in Cassandra. I looked at several online articles including
>> this one: https://docs.datastax.com/en/cassandra/3.0/
>> cassandra/dml/dmlAboutReads.html
>>
>> As per my understanding, two disk seeks are required in the worst case.
>> One is for reading the partition index and another is to read the actual
>> data from the compressed partition. The index of the data in compressed
>> partitions is obtained from the compression offset tables (which is stored
>> in memory). Am I on the right track here? Will there ever be a case when
>> more than 1 disk seek is required to read the data?
>>
>> Thanks,
>>
>> Preetika
>>
>>
>>
>>
>>
>>
>>

Re: question on maximum disk seeks

Reply via email to