You could (for now) store counters in Table1.Standard1['cassandra']['frequency-mapperid']. At the end, you do a get_slice and add them up. This is really bad for fault-tolerance -- you'll get wrong counts if mappers were restarted because of failures. But then, you'd have the same problem if you (transactionally) incremented a single counter too. This way, modulo failures your answer is still correct.
On Fri, Jul 17, 2009 at 8:41 AM, Jonathan Ellis<[email protected]> wrote: > This is the kind of inconsistency that vector clocks can handle but > the more simplistic timestamp-based resolution cannot. > > Of test-and-set vs vector clocks, vector clocks fits cassandra much better. > > -Jonathan > > On Fri, Jul 17, 2009 at 9:59 AM, Jun Rao<[email protected]> wrote: >> This is a case where a test-and-set feature would be useful. See the >> following JIRA. We just don't have it nailed down yet. >> https://issues.apache.org/jira/browse/CASSANDRA-48 >> >> Jun >> IBM Almaden Research Center >> K55/B1, 650 Harry Road, San Jose, CA 95120-6099 >> >> [email protected] >> >> Ivan Chang <[email protected]> >> >> >> Ivan Chang <[email protected]> >> >> 07/17/2009 07:14 AM >> >> Please respond to >> [email protected] >> >> To >> [email protected] >> cc >> >> Subject >> Concurrent updates >> I have the following scenario that would like a best solution for. >> >> Here's the scenario: >> >> Table1.Standard1['cassandra']['frequency'] >> >> it is used for keeping track of how many times the word "cassandra" >> appeared. >> >> Let's say we have a bunch of articles stored in Hadoop, a Map/Reduce greps >> all articles throughout the Hadoop cluster that matches the pattern >> ^cassandra$ >> and updates Table1.Standard1['cassandra']['frequency']. Hence >> Table1.Standard1['cassandra']['frequency'] will be updated concurrently. >> >> One of the issues I am facing is that >> Table1.Standard1['cassandra']['frequency'] >> stores the count as a String (I am using Java), so in order to update the >> frequency >> properly, the thread that's running the Map/Reduce will have to retrieve >> Table1.Standard1['cassandra']['frequency'] in its native String format and >> hold >> that in temp (java Sttring), convert into int, then add the new counts in, >> and finally >> "SET Table1.Standard1['cassandra']['frequency']. = '" + temp.toString() + >> ''" >> >> During the entire process, how do we guranatee concurrency. The Cql SET >> does >> not allow something like >> >> SET Table1.Standard1['cassandra']['frequency']. = >> Table1.Standard1['cassandra']['frequency']. + newCounts >> >> since there's only one String type. >> >> What would be the best solution in this situtaion? >> >> Thanks, >> Ivan >> >
