[jira] [Created] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

Mck SembWever (JIRA) Mon, 05 Sep 2011 00:40:02 -0700

Allow CFIF to keep going despite unavailable ranges
---------------------------------------------------


                 Key: CASSANDRA-3136
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136
             Project: Cassandra
          Issue Type: Improvement
          Components: Hadoop
            Reporter: Mck SembWever
            Priority: Minor


>From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

<use-case-1>
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
</use-case-1>

<use-case-2>
trying to extract a small random sample (like a pig SAMPLE) of data out of 
cassandra.
</use-case-2>

<use-case-3>
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. 
For example such a job could be run at regular intervals in the day until a hit 
was found.
</use-case-3>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

Reply via email to