You’re comparing single machine key/value stores to a distributed db with a much richer data model (partitions/slices, statics, range reads, range deletions, etc). They’re going to read very differently. Instead of explaining why they’re not like rocks/ldb, how about you tell us what you’re trying to do / learn so we can answer the real question?
Few other notes inline. -- Jeff Jirsa > On Jan 8, 2019, at 10:51 PM, Jinhua Luo <luajit...@gmail.com> wrote: > > Thanks. Let me clarify my questions more. > > 1) For memtable, if the selected columns (assuming they are in simple > types) could be found in memtable only, why bother to search sstables > then? In leveldb and rocksdb, they would stop consulting sstables if > the memtable already fulfill the query. We stop at the memtable if we know that’s all we need. This depends on a lot of factors (schema, point read vs slice, etc) > > 2) For STCS and LCS, obviously, the sstables are grouped in > generations (old mutations would promoted into next level or bucket), > so why not search the columns level by level (or bucket by bucket) > until all selected columns are collected? In leveldb and rocksdb, they > do in this way. They’re single machine and Cassandra isn’t. There’s no guarantee in Cassandra that the small sstables in stcs or low levels in LCS are newest: - you can write arbitrary timestamps into the memtable - read repair can put old data in the memtable - streaming (bootstrap/repair) can put old data into new files - user processes (nodetool refresh) can put old data into new files > > 3) Could you explain the collection, cdt and counter types in more > detail? Does they need to iterate all sstables? Because they could not > be simply filtered by timestamp or value range. > I can’t (combination of time available and it’s been a long time since I’ve dealt with that code and I don’t want to misspeak). > For collection, when I select a column of collection type, e.g. > map<text, text>, to ensure the whole set of map fields is collected, > it is necessary to search in all sstables. > > For cdt, it needs to ensure all fields of the cdt is collected. > > For counter, it needs to merge all mutations distributed in all > sstables to give a final state of counter value. > > Am I correct? If so, then there three complex types seems less > efficient than simple types, right? > > Jeff Jirsa <jji...@gmail.com> 于2019年1月8日周二 下午11:58写道: >> >> First: >> >> Compaction controls how sstables are combined but not how they’re read. The >> read path (with one tiny exception) doesn’t know or care which compaction >> strategy you’re using. >> >> A few more notes inline. >> >>> On Jan 8, 2019, at 3:04 AM, Jinhua Luo <luajit...@gmail.com> wrote: >>> >>> Hi All, >>> >>> The compaction would organize the sstables, e.g. with LCS, the >>> sstables would be categorized into levels, and the read path should >>> read sstables level by level until the read is fulfilled, correct? >> >> LCS levels are to minimize the number of sstables scanned - at most one per >> level - but there’s no attempt to fulfill the read with low levels beyond >> the filtering done by timestamp. >> >>> >>> For STCS, it would search sstables in buckets from smallest to largest? >> >> Nope. No attempt to do this. >> >>> >>> What about other compaction cases? They would iterate all sstables? >> >> In all cases, we’ll use a combination of bloom filters and sstable metadata >> and indices to include / exclude sstables. If the bloom filter hits, we’ll >> consider things like timestamps and whether or not the min/max clustering of >> the sstable matches the slice we care about. We don’t consult the compaction >> strategy, though the compaction strategy may have (in the case of LCS or >> TWCS) placed the sstables into a state that makes this read less expensive. >> >>> >>> But in the codes, I'm confused a lot: >>> In >>> org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal, >>> it seems that no matter whether the selected columns (except the >>> collection/cdt and counter cases, let's assume here the selected >>> columns are simple cell) are collected and satisfied, it would search >>> both memtable and all sstables, regardless of the compaction strategy. >> >> There’s another that includes timestamps that will do some smart-ish >> exclusion of sstables that aren’t needed for the read command. >> >>> >>> Why? >>> >>> Moreover, for collection/cdt (non-frozen) and counter types, it would >>> need to iterate all sstable to ensure the whole set of the fields are >>> collected, correct? If so, such multi-cell or counter types are >>> heavyweight in performance, correct? >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org