[jira] Updated: (CASSANDRA-1106) Use Scanner API for all reads

Stu Hood (JIRA) Wed, 19 May 2010 08:40:16 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stu Hood updated CASSANDRA-1106:
--------------------------------

    Description: 
The goal of this issue is to eliminate the IColumnIterator interface, and to 
use the Slice/Scanner API for all reads. Additionally, this issue begins to 
optimize the interaction between FilteredScanner and QueryFilter to gain back 
speed lost in CASSANDRA-1095.

This issue adds Memtable.Scanner and converts Memtables to maps from 
DecoratedKey -> List<Slice> (where the list represents a row: one entry for 
Standard CFs, and more than one entry for Super CFs). Since Slices are 
immutable, rows in the Memtable are merged using SliceMergingIterator, and 
atomically swapped out. This is much less granular atomicity than we support 
currently, so this approach to mapping the Memtable to Slices is wide open to 
debate.

The row cache in this patch mimics the Memtable and becomes a map from 
DecoratedKey -> List<Slice>. In order to reuse the QueryFilter API, a 
db.ListScanner is added to wrap an individual row in the cache for filtering. 
One limitation imposed by this design is that the row cache can't be used as a 
write-through cache, since its entries are immutable.

The common order of operations is:
1. Get a SeekableScanner implementation for the Memtable/cache entry/SSTable
2. Build a QueryFilter describing the query
3. Call QueryFilter.filter(scanner) to wrap the SeekableScanner in a 
FilteredScanner
.* Optionally, merge multiple Scanners using MergingScanner
4. Call QueryFilter.collect(scanner) to wrap garbage collection around the 
merged input
5. Limit the output columns using QueryFilter.limit(scanner)

Optimization between FilteredScanner and QueryFilter is accomplished via the 
MatchResult object, which is pretty ugly, and still a work in progress. 
Internally to a QueryFilter, IFilters for each level return MatchResults 
indicating where their next interesting matches are, and QueryFilter composes 
the levels into a MatchResult that a FilteredScanner uses to seek on its 
underlying Scanner.

These patches remove a lot of deeply nested and complicated logic for dealing 
with super columns and garbage collection, including IFilter.filterSuperColumn 
(replaced naturally by Slice filtering), IFilter.collectReducedColumns (ditto) 
and ColumnFamilyStore.removeDeleted (replaced by ASlice.GCFunction). 
Additionally, they replace scads of AbstractIterator implementations that were 
implementing IColumnIterator on a case by case basis.

  was:
The goal of this issue is to eliminate the IColumnIterator interface, and to 
use the Slice/Scanner API for all reads. Additionally, this issue begins to 
optimize the interaction between FilteredScanner and QueryFilter to gain back 
speed lost in CASSANDRA-1095.

This issue adds Memtable.Scanner and converts Memtables to maps from 
DecoratedKey -> List<Slice> (where the list represents a row: one entry for 
Standard CFs, and more than one entry for Super CFs). Since Slices are 
immutable, rows in the Memtable are merged using SliceMergingIterator, and 
atomically swapped out. This is much less granular atomicity than we support 
currently, so this approach to mapping the Memtable to Slices is wide open to 
debate.

The row cache in this patch mimics the Memtable and becomes a map from 
DecoratedKey -> List<Slice>. In order to reuse the QueryFilter API, a 
db.ListScanner is added to wrap an individual row in the cache for filtering. 
One limitation imposed by this design is that the row cache can't be used as a 
write-through cache, since its entries are immutable.

The common order of operations is:
# Get a SeekableScanner implementation for the Memtable/cache entry/SSTable
# Build a QueryFilter describing the query
# Call QueryFilter.filter(scanner) to wrap the SeekableScanner in a 
FilteredScanner
* Optionally, merge multiple Scanners using MergingScanner
# Call QueryFilter.collect(scanner) to wrap garbage collection around the 
merged input
# Limit the output columns using QueryFilter.limit(scanner)

Optimization between FilteredScanner and QueryFilter is accomplished via the 
MatchResult object, which is pretty ugly, and still a work in progress. 
Internally to a QueryFilter, IFilters for each level return MatchResults 
indicating where their next interesting matches are, and QueryFilter composes 
the levels into a MathResult that a FilteredScanner uses to see on its 
underlying Scanner.

These patches remove a lot of deeply nested and complicated logic for dealing 
with super columns and garbage collection, including IFilter.filterSuperColumn 
(replaced naturally by Slice filtering), IFilter.collectReducedColumns (ditto) 
and ColumnFamilyStore.removeDeleted (replaced by ASlice.GCFunction). 
Additionally, they replace scads of AbstractIterator implementations that were 
implementing IColumnIterator on a case by case basis.


> Use Scanner API for all reads
> -----------------------------
>
>                 Key: CASSANDRA-1106
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1106
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Stu Hood
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 0001-Implement-transitional-CF-Slice-API.patch, 
> 0002-Per-parent-Slice-based-atomicity-for-Memtables.patch, 
> 0003-Use-Scanner-API-in-RowIteratorFactory-and-port-getTo.patch, 
> 0004-Remove-IColumnIterator-and-other-stale-I-Filter-code.patch, 
> 0005-Add-limit-parameter-to-QueryFilter-rather-than-level.patch, 
> 0006-Add-MatchResult-to-give-FilteredScanner-hints-to-fin.patch, 
> 0007-Compose-level-MatchResults-in-QueryFilter-and-begin-.patch, 
> 0008-Add-IFilter.initial-to-return-the-first-interesting-.patch
>
>
> The goal of this issue is to eliminate the IColumnIterator interface, and to 
> use the Slice/Scanner API for all reads. Additionally, this issue begins to 
> optimize the interaction between FilteredScanner and QueryFilter to gain back 
> speed lost in CASSANDRA-1095.
> This issue adds Memtable.Scanner and converts Memtables to maps from 
> DecoratedKey -> List<Slice> (where the list represents a row: one entry for 
> Standard CFs, and more than one entry for Super CFs). Since Slices are 
> immutable, rows in the Memtable are merged using SliceMergingIterator, and 
> atomically swapped out. This is much less granular atomicity than we support 
> currently, so this approach to mapping the Memtable to Slices is wide open to 
> debate.
> The row cache in this patch mimics the Memtable and becomes a map from 
> DecoratedKey -> List<Slice>. In order to reuse the QueryFilter API, a 
> db.ListScanner is added to wrap an individual row in the cache for filtering. 
> One limitation imposed by this design is that the row cache can't be used as 
> a write-through cache, since its entries are immutable.
> The common order of operations is:
> 1. Get a SeekableScanner implementation for the Memtable/cache entry/SSTable
> 2. Build a QueryFilter describing the query
> 3. Call QueryFilter.filter(scanner) to wrap the SeekableScanner in a 
> FilteredScanner
> .* Optionally, merge multiple Scanners using MergingScanner
> 4. Call QueryFilter.collect(scanner) to wrap garbage collection around the 
> merged input
> 5. Limit the output columns using QueryFilter.limit(scanner)
> Optimization between FilteredScanner and QueryFilter is accomplished via the 
> MatchResult object, which is pretty ugly, and still a work in progress. 
> Internally to a QueryFilter, IFilters for each level return MatchResults 
> indicating where their next interesting matches are, and QueryFilter composes 
> the levels into a MatchResult that a FilteredScanner uses to seek on its 
> underlying Scanner.
> These patches remove a lot of deeply nested and complicated logic for dealing 
> with super columns and garbage collection, including 
> IFilter.filterSuperColumn (replaced naturally by Slice filtering), 
> IFilter.collectReducedColumns (ditto) and ColumnFamilyStore.removeDeleted 
> (replaced by ASlice.GCFunction). Additionally, they replace scads of 
> AbstractIterator implementations that were implementing IColumnIterator on a 
> case by case basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1106) Use Scanner API for all reads

Reply via email to