[
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189046#comment-13189046
]
Pavel Yaskevich commented on CASSANDRA-2392:
--------------------------------------------
Thanks for the patch! Here is my review:
- Index summaries load in SSTableReader.load(boolean, Set<DecoratedKey>) breaks
key cache pre-load.
- IndexSummary deserialize(...) method should be made static and return
IndexSummary object. This will also allow to drop IndexSummary argument from
SSTableReader.loadSummaries(...).
- To avoid any seeks in the PRIMARY_INDEX file upon IndexSummary.deserialize I
suggest to save key (only BB part) as well as index position on
IndexSummary.serialize.
- I would also suggest to save dataPosition from the primary index into
summaries file to avoid adding serialization to SegmentedFile because
SegmentedFile serialize(...)/deserialize(...) are not really a
serialize/deserialize - they just save/read boundaries. This way you would be
able to do deserialization and boundary load at the save time without
saving/reading additional information to/from the disk because only ibuilder
needs indexPosition and dbuilder - dataPosition.
- loadSummaries should be renamed to something more appropriate because that
method does not only load index summaries it also loads index and data
builders, per se it does not really load them but rather just deserializes
boundaries into an existing object with is not a good practice.
- can you please explain this chunk of code to me?
{code}
+ // don't rename summaries as it is not created yet and created
while it is loaded.
+ for (Component component : Sets.difference(components,
Sets.newHashSet(Component.DATA, Component.SUMMARIES)))
FBUtilities.renameWithConfirm(tmpdesc.filenameFor(component),
newdesc.filenameFor(component));
{code}
> Saving IndexSummaries to disk
> -----------------------------
>
> Key: CASSANDRA-2392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Chris Goffinet
> Assignee: Vijay
> Priority: Minor
> Fix For: 1.1
>
> Attachments: 0001-re-factor-first-and-last.patch,
> 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk.patch
>
>
> For nodes with millions of keys, doing rolling restarts that take over 10
> minutes per node can be painful if you have 100 node cluster. All of our time
> is spent on doing index summary computations on startup. It would be great if
> we could save those to disk as well. Our indexes are quite large.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira