[jira] Updated: (CASSANDRA-1155) keep persistent row statistics

Brandon Williams (JIRA) Thu, 22 Jul 2010 11:53:17 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brandon Williams updated CASSANDRA-1155:
----------------------------------------

    Attachment: 1155-v3.txt

v3 builds on v2, finishing the TODOs in CFS and loading the persistent 
statistics on SSTRs when opening an existing SST.  This has a deadlock problem 
when flushing, where a flush of A goes to write the stats, but meanwhile B has 
acquired the flusherlock in preparation to flush, so A can't acquire the lock 
to do the stats write, and B can't release the lock because we only allow N 
flushes at a time.  Because that's pretty hairy, I'm going to go the route of 
storing a separate -Statistics.db, but am posting this patch in case it turns 
out to be useful later.

> keep persistent row statistics
> ------------------------------
>
>                 Key: CASSANDRA-1155
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1155
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Brandon Williams
>             Fix For: 0.7 beta 1
>
>         Attachments: 1155-v2.txt, 1155-v3.txt, 1155.txt
>
>
> during flush and compaction we should keep row size statistics using 
> EstimatedHistogram (column count, and row size), replacing min/max/total 
> sizes in CFS.
> having this detail will let us estimate, given an index CF, how many nodes we 
> need to query to get the number of matching rows requested by the client.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-1155) keep persistent row statistics

Reply via email to