[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

Pavel Yaskevich (Commented) (JIRA) Sun, 22 Jan 2012 11:53:05 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190751#comment-13190751
 ]


Pavel Yaskevich commented on CASSANDRA-2392:
--------------------------------------------

bq. But the main idea is to reduce the code and the checks which we have to do 
just to populate the first and last variable. IMO it is better served in Index 
Summary which already has the needed checks. by using maybeAddEntry() and 
marking other private everywhere we dont need extra checks else where to 
populate the fields... first and last in a index is also a summary :)

Correct me if I'm wrong but as I see in SSTableReader.load(...) that condition 
"SSTable.last == IndexSummary.last" is not a guaranteed thing which means that 
IndexSummary.last has a different semantics from SSTable.last. According to 
checks - I don't see many of those and IndexSummary in it's current state does 
not have anything to do with SSTable's last/first variables so I don't really 
understand what checks are you talking about? If you really want to be pedantic 
about the domain of first/last - I agree that they could belong to the summary 
of the SSTable but certainly not to the "index" one :)

bq. Because we read from the disk to populate the Index Summary? If yes i can 
make sure that both the patches go into the same release.

Because we would end-up reading more data (e.g. some of the keys and all index 
and data positions would be read twice) from different files - primary_index 
and summary. 
                
> Saving IndexSummaries to disk
> -----------------------------
>
>                 Key: CASSANDRA-2392
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Chris Goffinet
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-re-factor-first-and-last.patch, 
> 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk-v2.patch, 
> 0002-save-summaries-to-disk-v3.patch, 0002-save-summaries-to-disk.patch
>
>
> For nodes with millions of keys, doing rolling restarts that take over 10 
> minutes per node can be painful if you have 100 node cluster. All of our time 
> is spent on doing index summary computations on startup. It would be great if 
> we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

Reply via email to