Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Donald Smith
Question about the read path in cassandra.  If a partition/row is in the 
Memtable and is being actively written to by other clients,  will a READ of 
that partition also have to hit SStables on disk (or in the page cache)?  Or 
can it be serviced entirely from the Memtable?

If you select all columns (e.g., select * from )   then I can imagine 
that cassandra would need to merge whatever columns are in the Memtable with 
what's in SStables on disk.

But if you select a single column (e.g., select Name from   where id= 
) and if that column is in the Memtable, I'd hope cassandra could skip 
checking the disk.  Can it do this optimization?

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]



Re: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Jonathan Haddad
No.  Consider a scenario where you supply a timestamp a week in the future,
flush it to sstable, and then do a write, with the current timestamp.  The
record in disk will have a timestamp greater than the one in the memtable.

On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith 
donald.sm...@audiencescience.com wrote:

  Question about the read path in cassandra.  If a partition/row is in the
 Memtable and is being actively written to by other clients,  will a READ of
 that partition also have to hit SStables on disk (or in the page
 cache)?  Or can it be serviced entirely from the Memtable?



 If you select all columns (e.g., “*select * from ….*”)   then I can
 imagine that cassandra would need to merge whatever columns are in the
 Memtable with what’s in SStables on disk.



 But if you select a single column (e.g., “*select Name from ….  where id=
 …*.”) and if that column is in the Memtable, I’d hope cassandra could
 skip checking the disk.  Can it do this optimization?



 Thanks, Don



 *Donald A. Smith* | Senior Software Engineer
 P: 425.201.3900 x 3866
 C: (206) 819-5965
 F: (646) 443-2333
 dona...@audiencescience.com


 [image: AudienceScience]






-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade


RE: Is cassandra smart enough to serve Read requests entirely from Memtables in some cases?

2014-10-22 Thread Donald Smith
On the cassandra irc channel I discussed this question.  I learned that the 
timestamp in the Memtable may be OLDER than the timestamp in some SSTable 
(e.g., due to hints or retries).  So there’s no guarantee that the Memtable has 
the most recent version.

But there may be cases, they say, in which the time stamp in the SSTable can be 
used to skip over SSTables that have older data (via metadata on SSTables, I 
presume).

Memtable are like write-through caches and do NOT correspond to SSTables loaded 
from disk.

From: jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] On Behalf Of 
Jonathan Haddad
Sent: Wednesday, October 22, 2014 9:24 AM
To: user@cassandra.apache.org
Subject: Re: Is cassandra smart enough to serve Read requests entirely from 
Memtables in some cases?

No.  Consider a scenario where you supply a timestamp a week in the future, 
flush it to sstable, and then do a write, with the current timestamp.  The 
record in disk will have a timestamp greater than the one in the memtable.

On Wed, Oct 22, 2014 at 9:18 AM, Donald Smith 
donald.sm...@audiencescience.commailto:donald.sm...@audiencescience.com 
wrote:
Question about the read path in cassandra.  If a partition/row is in the 
Memtable and is being actively written to by other clients,  will a READ of 
that partition also have to hit SStables on disk (or in the page cache)?  Or 
can it be serviced entirely from the Memtable?

If you select all columns (e.g., “select * from ….”)   then I can imagine that 
cassandra would need to merge whatever columns are in the Memtable with what’s 
in SStables on disk.

But if you select a single column (e.g., “select Name from ….  where id= ….”) 
and if that column is in the Memtable, I’d hope cassandra could skip checking 
the disk.  Can it do this optimization?

Thanks, Don

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866tel:425.201.3900%20x%203866
C: (206) 819-5965tel:%28206%29%20819-5965
F: (646) 443-2333tel:%28646%29%20443-2333
dona...@audiencescience.commailto:dona...@audiencescience.com

[AudienceScience]




--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade