Allow CFIF to keep going despite unavailable ranges
---------------------------------------------------
Key: CASSANDRA-3136
URL: https://issues.apache.org/jira/browse/CASSANDRA-3136
Project: Cassandra
Issue Type: Improvement
Components: Hadoop
Reporter: Mck SembWever
Priority: Minor
>From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902
<use-case-1>
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
</use-case-1>
<use-case-2>
trying to extract a small random sample (like a pig SAMPLE) of data out of
cassandra.
</use-case-2>
<use-case-3>
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later.
For example such a job could be run at regular intervals in the day until a hit
was found.
</use-case-3>
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira