Oh I see. I understand it now. Thank you for the clarification! Preetika
On Tue, Mar 21, 2017 at 11:07 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > Each sstable has it's own partition index, therefore it's never updated. > > On Tue, Mar 21, 2017 at 11:04 AM preetika tyagi <preetikaty...@gmail.com> > wrote: > >> Yes, I understand that. However, what I'm trying to understand is the >> internal structure of partition index. When a record associate with the >> same partition key is updated, we have two different records with different >> timestamps. There are chances of these two records being split across two >> different SSTables (of course as long as compaction is not merging them >> into one SSTable eventually). How partition index looks like in such case? >> For the same key, we have two different records in different SSTables. How >> does partition index store such information? Can it have repeated partition >> keys with different disk offsets pointing to different SSTables? >> >> On Tue, Mar 21, 2017 at 10:09 AM, Jonathan Haddad <j...@jonhaddad.com> >> wrote: >> >> The partition index is never updated, as sstables are immutable. >> >> On Tue, Mar 21, 2017 at 9:40 AM preetika tyagi <preetikaty...@gmail.com> >> wrote: >> >> Thank you Jan & Jeff for the responses. That was really useful. >> >> Jan - I have one follow-up question. When the data is spread over more >> than one SSTable in case of updates as you mentioned, we will need two >> seeks per SSTable (one for partition index and another for SSTable itself). >> I'm curious to know how partition index is structured internally. I was >> assuming it to be a table with <key, disk offset> pairs. In case of an >> update to the same key for several times, how it is recorded in the >> partition index? >> >> Thanks, >> Preetika >> >> On Mon, Mar 20, 2017 at 10:37 PM, <j.kes...@enercast.de> wrote: >> >> Hi, >> >> >> >> youre right – one seek with hit in the partition key cache and two if not. >> >> >> >> Thats the theory – but two thinge to mention: >> >> >> >> First, you need two seeks per sstable not per entire read. So if you data >> is spread over multiple sstables on disk you obviously need more then two >> reads. Think of often updated partition keys – in combination with memory >> preassure you can easily end up with maaany sstables (ok they will be >> compacted some time in the future). >> >> >> >> Second, there could be fragmentation on disk which leads to seeks during >> sequential reads. >> >> >> >> Jan >> >> >> >> Gesendet von meinem Windows 10 Phone >> >> >> >> *Von: *preetika tyagi <preetikaty...@gmail.com> >> *Gesendet: *Montag, 20. März 2017 21:18 >> *An: *user@cassandra.apache.org >> *Betreff: *question on maximum disk seeks >> >> >> >> I'm trying to understand the maximum number of disk seeks required in a >> read operation in Cassandra. I looked at several online articles including >> this one: https://docs.datastax.com/en/cassandra/3.0/ >> cassandra/dml/dmlAboutReads.html >> >> As per my understanding, two disk seeks are required in the worst case. >> One is for reading the partition index and another is to read the actual >> data from the compressed partition. The index of the data in compressed >> partitions is obtained from the compression offset tables (which is stored >> in memory). Am I on the right track here? Will there ever be a case when >> more than 1 disk seek is required to read the data? >> >> Thanks, >> >> Preetika >> >> >> >> >> >> >>