[ 
https://issues.apache.org/jira/browse/CASSANDRA-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134391#comment-15134391
 ] 

Paulo Motta commented on CASSANDRA-10990:
-----------------------------------------

Initial version is ready for review. Feedback on approach and correctness will 
be greatly appreciated.

*Patch Overview*

The patch adds support for streaming pre-3.0 sstables and a comprehensive test 
suite around it. Adding support to non-static-compact tables was simple, 
basically wokaround the lack of serialization header by using a header with no 
stats and deserialize clustering prefix with old format deserializer while 
serializing in new format.

The main challenge was to provide support to streaming compact static tables, 
because in the new format the static columns must be the first columns in a 
partition while in the previous format they can be in any position of the 
partition. This means that each partition must be traversed to search for 
static columns and then rewinded to search for remaining non-static columns.

In order to solve this I added a new {{CachedInputStream}} that adds mark/reset 
functionality to a source stream and allows to cooperatively cascade multiple 
{{CachedInputStream}} with different capacities to create an input stream cache 
hierarchy. For instance, I used this feature on {{StreamDeserializer}} for 
pre-3.0 sstables that uses a {{MemoryCachedInputStream}} that falls back to a 
{{FileCachedInputStream}} when it runs out of capacity in memory. The 
{{FileCachedInputStream}} may write a temporary buffer file to a data directory 
and remove it once the file is successfully streamed or if it fails.

This approach allow us to use the {{OldFormatDeserializer}} transparently, and 
the same code path for reading pre-3.0 sstables is used to stream pre-3.0 
sstables. Note that the {{CachedInputStream}} is only used to stream pre-3.0 
sstables in order to provide rewind functionality and will not affect existing 
behavior.

Please note that performance was not the objective here, but mostly support 
streaming functionality of pre-3.0 sstables. Compact static tables may suffer a 
slight performance hit due to buffer copying and rewinding, but non-compact 
static tables will not have performance affected since the stream cache will 
not be used.

*Tests*

* *Unit tests*: Extended {{LegacySStableTest}} to test streaming of legacy 
compact sstables since jb version.
** Add comprehensive test suite for different {{CachedInputStream}} variants on 
{{RewindableDataInputStreamPlusTest}}
* *SStable loader dtests*: Extended {{sstable_generation_loading_test}} to 
sstableload 2.1 (ka) sstables with different compression settings.
* *Upgrade dtests*: Extended CASSANDRA-10563 upgrade dtests to bootstrap soon 
after upgrading, to test bootstrap streaming of legacy sstables.

*TODO*

* Cleanup of leftover buffer files on startup.
* Improve documentation of {{CachedInputStream}}, {{MemoryCachedInputStream}} 
and {{FileCachedInputStream}}
* Make max memory buffer size a system property and change it on dtests
* {{LegacySSTableTest}} passes when executed individually but fails when 
executed on a suite, probably some leftovers from previous test that need to be 
cleaned up.
* Add la sstables to {{sstable_generation_loading_test}}
* Fix 
{{upgrade_8099_test.py:TestBootstrapAfterUpgrade.upgrade_with_wide_partition_test}}

||3.0||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:10990]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:10990]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10990-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10990-dtest/lastCompletedBuild/testReport/]|

[~philipthompson] when you have time, could you please setup a custom dtest run 
with the dtest branch above? Thanks!

> Support streaming of older version sstables in 3.0
> --------------------------------------------------
>
>                 Key: CASSANDRA-10990
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10990
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Jeremy Hanna
>            Assignee: Paulo Motta
>
> In 2.0 we introduced support for streaming older versioned sstables 
> (CASSANDRA-5772).  In 3.0, because of the rewrite of the storage layer, this 
> became no longer supported.  So currently, while 3.0 can read sstables in the 
> 2.1/2.2 format, it cannot stream the older versioned sstables.  We should do 
> some work to make this still possible to be consistent with what 
> CASSANDRA-5772 provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to