Distinct Counter Proposal for Cassandra

2012-06-13 Thread Utku Can Topçu
Hi All, Let's assume we have a use case where we need to count the number of columns for a given key. Let's say the key is the URL and the column-name is the IP address or any cardinality identifier. The straight forward implementation seems to be simple, just inserting the IP Adresses as

Re: Distinct Counter Proposal for Cassandra

2012-06-13 Thread Utku Can Topçu
may need some work. Other alternative is self-learning bitmap ( http://ect.bell-labs.com/who/aychen/sbitmap4p.pdf) which, in my understanding, is more memory efficient when counting small values. Yuki On Wednesday, June 13, 2012 at 11:28 AM, Utku Can Topçu wrote: Hi All, Let's assume we

Re: last record rowId

2011-06-15 Thread Utku Can Topçu
As far as I can tell, this functionality doesn't exist. However you can use such a method to insert the rowId into another column within a seperate row, and request the latest column. I think this would work for you. However every insert would need a get request, which I think would be

Re: Corrupted Counter Columns

2011-05-28 Thread Utku Can Topçu
, 2011 at 1:59 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Hello, I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes. Strangely counters are corrupted. Say, the actual value should be : 51664 and the value

Re: expiring + counter column?

2011-05-28 Thread Utku Can Topçu
How about implementing a freezing mechanism on counter columns. If there are no more increments within freeze seconds after the last increments (it would be orders or day or so); the column would lock itself on increments and won't accept increment. And after this freeze perioid, the ttl should

Corrupted Counter Columns

2011-05-26 Thread Utku Can Topçu
Hello, I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes. Strangely counters are corrupted. Say, the actual value should be : 51664 and the value that cassandra sometimes outputs is: either 51664 or 18651001. And I have no idea on how to diagnose the problem or reproduce it. Can you help me

Re: Corrupted Counter Columns

2011-05-26 Thread Utku Can Topçu
Some additional information on the settings: I'm using CL.ONE for both reading and writing; and replicate_on_write is true on the Counters CF. I think the problem occurs after a restart when the commitlogs are read. On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu u...@topcu.gen.tr wrote

CounterColumn increments gone after restart

2011-05-12 Thread Utku Can Topçu
Hi guys, I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the way it should be but: - I create a ColumnFamily named Counters - do a few increments on a column. - kill cassandra - start cassandra When I look at the counter column, the value is 1. See the following pastebin

Re: CounterColumn increments gone after restart

2011-05-12 Thread Utku Can Topçu
see the ticket https://issues.apache.org/jira/browse/CASSANDRA-2642 please On Thu, May 12, 2011 at 3:28 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Hi guys, I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the way it should be but: - I create a ColumnFamily named

Does counter columns support TTL

2011-02-17 Thread Utku Can Topçu
Hi All, I'm experimenting and developing using counters. However, I've come to a usecase where I need counters to expire and get deleted after a certain time of inactivity (i.e. have countercolumn deleted one hour after the last increment). As far as I can tell counter columns don't have TTL in

Re: Commercial support for cassandra

2011-02-17 Thread Utku Can Topçu
http://wiki.apache.org/cassandra/ThirdPartySupport On Thu, Feb 17, 2011 at 12:20 AM, Sal Fuentes fuente...@gmail.com wrote: They also offer great training sessions. Have a look at their site for more information: http://www.datastax.com/about-us On Wed, Feb 16, 2011 at 3:13 PM, Michael

Re: Does counter columns support TTL

2011-02-17 Thread Utku Can Topçu
Can anyone confirm that this patch works with the current trunk? On Thu, Feb 17, 2011 at 4:16 PM, Sylvain Lebresne sylv...@datastax.comwrote: https://issues.apache.org/jira/browse/CASSANDRA-2103 On Thu, Feb 17, 2011 at 4:05 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Hi All, I'm

Re: Does counter columns support TTL

2011-02-17 Thread Utku Can Topçu
And I think this patch would still be useful and legitimate if the TTL of the initial increment is taken into account. On Thu, Feb 17, 2011 at 6:11 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Yes, I've read the discussion. My use-case is similar to the use-case of the contributor. So that's

Re: Implemeting a LRU in Cassandra

2011-02-10 Thread Utku Can Topçu
. Would that work for you? Aaron On 9 Feb 2011, at 23:58, Utku Can Topçu wrote: Hi All, I'm sure people here have tried to solve similar questions. Say I'm tracking pages, I want to access the least recently used 1000 unique pages (i.e. columnnames). How can I achieve this? Using

Re: Super Slow Multi-gets

2011-02-10 Thread Utku Can Topçu
Dear Bill, How about the size of the row in the Messages CF. Is it too big? Might you be having an overhead of the bandwidth? Regards, Utku On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs bill.spe...@gmail.com wrote: I have a 7 node setup with a replication factor of 1 and a read consistency of

Re: Super Slow Multi-gets

2011-02-10 Thread Utku Can Topçu
Speirs bill.spe...@gmail.com wrote: Each message row is well under 1K. So I don't think it is network... plus all boxes are on a fast LAN. Bill- On Feb 10, 2011 11:59 AM, Utku Can Topçu u...@topcu.gen.tr wrote: Dear Bill, How about the size of the row in the Messages CF. Is it too

Implemeting a LRU in Cassandra

2011-02-09 Thread Utku Can Topçu
Hi All, I'm sure people here have tried to solve similar questions. Say I'm tracking pages, I want to access the least recently used 1000 unique pages (i.e. columnnames). How can I achieve this? Using a row with say, ttl=60 seconds would solve the problem of accessing the least recently used

Re: Hadoop Integration doesn't work when one node is down

2011-01-02 Thread Utku Can Topçu
I've created an issue, was this what you were asking Jonathan? https://issues.apache.org/jira/browse/CASSANDRA-1927 On Mon, Jan 3, 2011 at 12:24 AM, Jonathan Ellis jbel...@gmail.com wrote: Can you create one? On Sun, Jan 2, 2011 at 4:39 PM, mck m...@apache.org wrote: Is this a bug or

Re: Replacing nodes of the cluster in 0.7.0-RC1

2010-12-05 Thread Utku Can Topçu
Since no reply came in afew days, I tried my proposed steps and it all worked fine. Just to let you know. On Sat, Dec 4, 2010 at 10:31 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Hi All, I'm currently not happy with the hardware and the operating system of our 4-node cassandra cluster. I'm

Replacing nodes of the cluster in 0.7.0-RC1

2010-12-04 Thread Utku Can Topçu
Hi All, I'm currently not happy with the hardware and the operating system of our 4-node cassandra cluster. I'm planning to move the cluster to a different hardware/OS architecture. For this purpose I'm planning to bring up 4 new nodes, so that each node will be a replacement of another node in

Detecting failed nodes and restarting

2010-12-02 Thread Utku Can Topçu
Hi All, The question is really simple. Is there anyone out there using a set of scripts in production that detects failures of cassandra processes and restarts them or takes required actions. If so, how can we implement a generic solution for this problem? Regards, Utku

Deleting the datadir for system keyspace in 0.7

2010-11-15 Thread Utku Can Topçu
Hello All, I'm wondering before restarting the a node in a cluster. If I delete the system keyspace, what data would I be losing, would I be losing anything? Regards, Utku

Re: Deleting the datadir for system keyspace in 0.7

2010-11-15 Thread Utku Can Topçu
. Everything but the hints can be replaced. Gary. On Mon, Nov 15, 2010 at 06:29, Utku Can Topçu u...@topcu.gen.tr wrote: Hello All, I'm wondering before restarting the a node in a cluster. If I delete the system keyspace, what data would I be losing, would I be losing anything

Cassandra Hadoop Integration not compatible with Hadoop 0.21.0

2010-11-05 Thread Utku Can Topçu
When I try to read a CF from Hadoop, just after issuing the run I get this error: Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at

Re: Time to wait for CF to be consistent after stopping writes.

2010-10-28 Thread Utku Can Topçu
...@gmail.com wrote: On Wed, Oct 27, 2010 at 05:08, Utku Can Topçu u...@topcu.gen.tr wrote: Hi, For a columnfamily in a keyspace which has RF=3, I'm issuing writes with ConsistencyLevel.ONE. in the configuration I have: - memtable_flush_after_mins : 30 - memtable_throughput_in_mb : 32

Time to wait for CF to be consistent after stopping writes.

2010-10-27 Thread Utku Can Topçu
Hi, For a columnfamily in a keyspace which has RF=3, I'm issuing writes with ConsistencyLevel.ONE. in the configuration I have: - memtable_flush_after_mins : 30 - memtable_throughput_in_mb : 32 I'm writing to this columnfamily continuously for about 1 hour then stop writing. So the question

Reading a keyrange when using RP

2010-10-21 Thread Utku Can Topçu
If I'm not mistaken cassandra has been providing support for keyrange queries also on RP. However when I try to define a keyrange such as, start: (key100, end: key200) I get an error like: InvalidRequestException(why:start key's md5 sorts after end key's md5. this is not allowed; you probably

creating and dropping columnfamilies as a usecase

2010-10-21 Thread Utku Can Topçu
Hi All, In the current project I'm working on. I have use case for hourly analyzing the rows. Since the 0.7x branch supports creating and dropping columnfamilies on the fly; My use case proposal will be: * Create a CF at the very beginning of every hour * At the end of the 1-hour period,

using jna.jar Unknown mlockall error 0

2010-10-08 Thread Utku Can Topçu
Hi, In order to continue on memory optimizations, I've been trying to use the JNA. However, when I copy the jna.jar to the lib directory? I get the warning. I'm currently running the 0.6.5 version of cassandra. WARN [main] 2010-10-08 09:16:18,924 FBUtilities.java (line 595) Unknown mlockall

Re: using jna.jar Unknown mlockall error 0

2010-10-08 Thread Utku Can Topçu
I'm running an Ubuntu 9.10 linux box. On Fri, Oct 8, 2010 at 11:33 AM, Roger Schildmeijer schildmei...@gmail.comwrote: On Fri, Oct 8, 2010 at 11:27 AM, Utku Can Topçu u...@topcu.gen.tr wrote: Hi, In order to continue on memory optimizations, I've been trying to use the JNA. However, when

Re: using jna.jar Unknown mlockall error 0

2010-10-08 Thread Utku Can Topçu
that mlockall error 0. Maybe there is another solution anyway. nico008 On 08/10/2010 11:33, Roger Schildmeijer wrote: On Fri, Oct 8, 2010 at 11:27 AM, Utku Can Topçu u...@topcu.gen.tr wrote: Hi, In order to continue on memory optimizations, I've been trying to use the JNA. However

Re: Tuning cassandra to use less memory

2010-10-06 Thread Utku Can Topçu
Hi Oleg, I've been also looking into these after some research. I've been tacking with: 1. Setting the default max and min heap from 1G to 1500M. 2. I'm not using row caches, and the key caches are set to 1000, before they were 200K as default 3. I've lowered the memtable throughput to 32MB 4.

Re: A proposed use case, any comments and experience is appreciated

2010-10-04 Thread Utku Can Topçu
. On Mon, Oct 4, 2010 at 8:48 AM, Utku Can Topçu u...@topcu.gen.tr wrote: Hi Jonathan, Thank you for mentioning about the expiring columns issue. I didn't know that it had existed. That's really great news. First of all, does the current 0.6 branch support it? If not so, is the patch

Re: A proposed use case, any comments and experience is appreciated

2010-10-04 Thread Utku Can Topçu
. On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu u...@topcu.gen.tr wrote: Hey All, I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are formed in such a fashion that, they are indexed in descending order by time. So I'll be analyzing the data for every hour iteratively

Hardware change of a node in the cluster

2010-10-04 Thread Utku Can Topçu
Hey All, Recently I've tried to upgrade (hw upgrade) one of the nodes in my cassandra cluster from ec2-small to ec2-large. However, there were problems and since the IP of the new instance was different from the previous instance. The other nodes didnot recognize it in the ring. So what should

Best strategy for adding new nodes to the cluster

2010-09-27 Thread Utku Can Topçu
Hi All, We're currently running a cassandra cluster with Replication Factor 3, consisting of 4 nodes. The current situation is: - The nodes are all identical (AWS small instances) - Data directory is in the partition (/mnt) which has 150G capacity and each node has around 90 GB load, so 60 G

Having different 0.6.x instances in one Cassandra cluster

2010-08-05 Thread Utku Can Topçu
Hi All, I'm planning to use the current 0.6.4 stable for creating an image that would be the base for nodes in our Cassandra cluster. However, the 0.6.5 release is on the way. When the 0.6.5 has been released. Is it possible to have some of the nodes stay in 0.6.4 and having new nodes in 0.6.5?

Lucene CassandraDirectory Implementation

2010-07-22 Thread Utku Can Topçu
Hi All, I was browsing through the Lucene JIRA and came across the issue named A Column-Oriented Cassandra-Based Lucene Directory at https://issues.apache.org/jira/browse/LUCENE-2456 Has anyone had a chance to test it? If so, do you think it's an efficient implementation as a replacement for the

Cassandra Data Model Design Visualization

2010-06-29 Thread Utku Can Topçu
Hey Guys, I've been into designing an application which consists of more than 20 ColumnFamily's. Each ColumnFamily has some columns referencing to keys in other ColumnFamily's, some keys in ColumnFamily are combination of keys/columns in other ColumnFamily's. I guess most of the people are

Implementing Counter on Cassandra

2010-06-29 Thread Utku Can Topçu
Hey Guys, Currently in a project I'm involved in, I need to have some columns holding incremented data. The easy approach for implementing a counter with increments is right now as I figured out is read - increment - insert however this approach is not an atomic operation and can easily be

Getting keys in a range sorted with respect to last access time

2010-06-07 Thread Utku Can Topçu
Hey All, First of all I'll start with some questions on the default behavior of get_range_slices method defined in the thrift API. Given a keyrange with start-key kstart and end-key kend, assuming kstartkend; * Is it true that I'll get the range [kstart,kend) (kstart inclusive, kend exclusive)?

Re: Anyone using hadoop/MapReduce integration currently?

2010-05-25 Thread Utku Can Topçu
Hi Jeremy, Why are you using Cassandra versus using data stored in HDFS or HBase? - I'm thinking of using it for realtime streaming of user data. While streaming the requests, I'm also using Lucandra for indexing the data in realtime. It's a better option when you compare it with HBase or the

Re: Real-time Web Analysis tool using Cassandra. Doubts...

2010-05-12 Thread Utku Can Topçu
What makes cassandra a poor choice is the fact that, you can't use a keyrange as input for the map phase for Hadoop. On Wed, May 12, 2010 at 4:37 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati paulogpoi...@gmail.com wrote: - First of all,

Distributed export and import into cassandra

2010-05-03 Thread Utku Can Topçu
Hey All, I have a simple sample use case, The aim is to export the columns in a column family into flat files with the keys in range from k1 to k2. Since all the nodes in the cluster is supposed to contain some of the distribution of data, is it possible to make each node dump its own local data

ColumnFamilyOutputFormat?

2010-04-30 Thread Utku Can Topçu
Hey All, I've been looking at the documentation and related articles about Cassandra and Hadoop integration, I'm only seeing ColumnFamilyInputFormat for now. What if I want to write directly to cassandra after a reduce? What comes to my mind is, in the Reducer's setup I'd initialize a Cassandra

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-30 Thread Utku Can Topçu
at 3:22 PM, Jonathan Ellis jbel...@gmail.com wrote: Sounds like doing this w/o m/r with get_range_slices is a reasonable way to go. On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu u...@topcu.gen.tr wrote: I'm currently writing collected data continuously to Cassandra, having keys starting

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-30 Thread Utku Can Topçu
I meant in the first sentence running the get_range_slices from a single point On Fri, Apr 30, 2010 at 4:08 PM, Utku Can Topçu u...@topcu.gen.tr wrote: Do you mean, running the get_range_slices from a single? Yes, it would be reasonable for a relatively small key range, when it comes

TimedOutException when using the ColumnFamilyInputFormat

2010-04-29 Thread Utku Can Topçu
Hey All, I'm trying to run some tests on cassandra an Hadoop integration. I'm basically following the word count example at https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.javausing the ColumnFamilyInputFormat. Currently I have one-node cassandra and hadoop

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Utku Can Topçu
, 2010 at 11:32 PM, Jonathan Ellis jbel...@gmail.com wrote: It's technically possible but 0.6 does not support this, no. What is the use case you are thinking of? On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu u...@topcu.gen.tr wrote: Hi, I've been trying to use Cassandra for some kind