[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189046#comment-13189046
 ] 

Pavel Yaskevich commented on CASSANDRA-2392:
--------------------------------------------

Thanks for the patch! Here is my review:

- Index summaries load in SSTableReader.load(boolean, Set<DecoratedKey>) breaks 
key cache pre-load.

- IndexSummary deserialize(...) method should be made static and return 
IndexSummary object. This will also allow to drop IndexSummary argument from 
SSTableReader.loadSummaries(...).

- To avoid any seeks in the PRIMARY_INDEX file upon IndexSummary.deserialize I 
suggest to save key (only BB part) as well as index position on 
IndexSummary.serialize.

- I would also suggest to save dataPosition from the primary index into 
summaries file to avoid adding serialization to SegmentedFile because 
SegmentedFile serialize(...)/deserialize(...) are not really a 
serialize/deserialize - they just save/read boundaries. This way you would be 
able to do deserialization and boundary load at the save time without 
saving/reading additional information to/from the disk because only ibuilder 
needs indexPosition and dbuilder - dataPosition.

- loadSummaries should be renamed to something more appropriate because that 
method does not only load index summaries it also loads index and data 
builders, per se it does not really load them but rather just deserializes 
boundaries into an existing object with is not a good practice.

- can you please explain this chunk of code to me?
{code}
+            // don't rename summaries as it is not created yet and created 
while it is loaded.
+            for (Component component : Sets.difference(components, 
Sets.newHashSet(Component.DATA, Component.SUMMARIES)))
                  FBUtilities.renameWithConfirm(tmpdesc.filenameFor(component), 
newdesc.filenameFor(component));
{code}


                
> Saving IndexSummaries to disk
> -----------------------------
>
>                 Key: CASSANDRA-2392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-re-factor-first-and-last.patch, 
> 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk.patch
>
>
> For nodes with millions of keys, doing rolling restarts that take over 10 
> minutes per node can be painful if you have 100 node cluster. All of our time 
> is spent on doing index summary computations on startup. It would be great if 
> we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to