Hi All,
Let's assume we have a use case where we need to count the number of
columns for a given key. Let's say the key is the URL and the column-name
is the IP address or any cardinality identifier.
The straight forward implementation seems to be simple, just inserting the
IP Adresses as
may need some work.
Other alternative is self-learning bitmap (
http://ect.bell-labs.com/who/aychen/sbitmap4p.pdf) which, in my
understanding, is more memory efficient when counting small values.
Yuki
On Wednesday, June 13, 2012 at 11:28 AM, Utku Can Topçu wrote:
Hi All,
Let's assume we
As far as I can tell, this functionality doesn't exist.
However you can use such a method to insert the rowId into another column
within a seperate row, and request the latest column.
I think this would work for you. However every insert would need a get
request, which I think would be
, 2011 at 1:59 PM, Sylvain Lebresne sylv...@datastax.comwrote:
On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
Hello,
I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes.
Strangely counters are corrupted. Say, the actual value should be : 51664
and the value
How about implementing a freezing mechanism on counter columns.
If there are no more increments within freeze seconds after the last
increments (it would be orders or day or so); the column would lock itself
on increments and won't accept increment.
And after this freeze perioid, the ttl should
Hello,
I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes.
Strangely counters are corrupted. Say, the actual value should be : 51664
and the value that cassandra sometimes outputs is: either 51664 or 18651001.
And I have no idea on how to diagnose the problem or reproduce it.
Can you help me
Some additional information on the settings:
I'm using CL.ONE for both reading and writing; and replicate_on_write is
true on the Counters CF.
I think the problem occurs after a restart when the commitlogs are read.
On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu u...@topcu.gen.tr wrote
Hi guys,
I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the way
it should be but:
- I create a ColumnFamily named Counters
- do a few increments on a column.
- kill cassandra
- start cassandra
When I look at the counter column, the value is 1.
See the following pastebin
see the ticket https://issues.apache.org/jira/browse/CASSANDRA-2642 please
On Thu, May 12, 2011 at 3:28 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
Hi guys,
I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the
way it should be but:
- I create a ColumnFamily named
Hi All,
I'm experimenting and developing using counters. However, I've come to a
usecase where I need counters to expire and get deleted after a certain time
of inactivity (i.e. have countercolumn deleted one hour after the last
increment).
As far as I can tell counter columns don't have TTL in
http://wiki.apache.org/cassandra/ThirdPartySupport
On Thu, Feb 17, 2011 at 12:20 AM, Sal Fuentes fuente...@gmail.com wrote:
They also offer great training sessions. Have a look at their site for more
information: http://www.datastax.com/about-us
On Wed, Feb 16, 2011 at 3:13 PM, Michael
Can anyone confirm that this patch works with the current trunk?
On Thu, Feb 17, 2011 at 4:16 PM, Sylvain Lebresne sylv...@datastax.comwrote:
https://issues.apache.org/jira/browse/CASSANDRA-2103
On Thu, Feb 17, 2011 at 4:05 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
Hi All,
I'm
And I think this patch would still be useful and legitimate if the TTL of
the initial increment is taken into account.
On Thu, Feb 17, 2011 at 6:11 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
Yes, I've read the discussion. My use-case is similar to the use-case of
the contributor.
So that's
.
Would that work for you?
Aaron
On 9 Feb 2011, at 23:58, Utku Can Topçu wrote:
Hi All,
I'm sure people here have tried to solve similar questions.
Say I'm tracking pages, I want to access the least recently used 1000
unique pages (i.e. columnnames). How can I achieve this?
Using
Dear Bill,
How about the size of the row in the Messages CF. Is it too big? Might you
be having an overhead of the bandwidth?
Regards,
Utku
On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs bill.spe...@gmail.com wrote:
I have a 7 node setup with a replication factor of 1 and a read
consistency of
Speirs bill.spe...@gmail.com
wrote:
Each message row is well under 1K. So I don't think it is network... plus
all boxes are on a fast LAN.
Bill-
On Feb 10, 2011 11:59 AM, Utku Can Topçu u...@topcu.gen.tr wrote:
Dear Bill,
How about the size of the row in the Messages CF. Is it too
Hi All,
I'm sure people here have tried to solve similar questions.
Say I'm tracking pages, I want to access the least recently used 1000 unique
pages (i.e. columnnames). How can I achieve this?
Using a row with say, ttl=60 seconds would solve the problem of accessing
the least recently used
I've created an issue, was this what you were asking Jonathan?
https://issues.apache.org/jira/browse/CASSANDRA-1927
On Mon, Jan 3, 2011 at 12:24 AM, Jonathan Ellis jbel...@gmail.com wrote:
Can you create one?
On Sun, Jan 2, 2011 at 4:39 PM, mck m...@apache.org wrote:
Is this a bug or
Since no reply came in afew days, I tried my proposed steps and it all
worked fine.
Just to let you know.
On Sat, Dec 4, 2010 at 10:31 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
Hi All,
I'm currently not happy with the hardware and the operating system of our
4-node cassandra cluster. I'm
Hi All,
I'm currently not happy with the hardware and the operating system of our
4-node cassandra cluster. I'm planning to move the cluster to a different
hardware/OS architecture.
For this purpose I'm planning to bring up 4 new nodes, so that each node
will be a replacement of another node in
Hi All,
The question is really simple. Is there anyone out there using a set of
scripts in production that detects failures of cassandra processes and
restarts them or takes required actions.
If so, how can we implement a generic solution for this problem?
Regards,
Utku
Hello All,
I'm wondering before restarting the a node in a cluster. If I delete the
system keyspace, what data would I be losing, would I be losing anything?
Regards,
Utku
. Everything but the hints can be replaced.
Gary.
On Mon, Nov 15, 2010 at 06:29, Utku Can Topçu u...@topcu.gen.tr wrote:
Hello All,
I'm wondering before restarting the a node in a cluster. If I delete the
system keyspace, what data would I be losing, would I be losing
anything
When I try to read a CF from Hadoop, just after issuing the run I get this
error:
Exception in thread main java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at
...@gmail.com wrote:
On Wed, Oct 27, 2010 at 05:08, Utku Can Topçu u...@topcu.gen.tr wrote:
Hi,
For a columnfamily in a keyspace which has RF=3, I'm issuing writes with
ConsistencyLevel.ONE.
in the configuration I have:
- memtable_flush_after_mins : 30
- memtable_throughput_in_mb : 32
Hi,
For a columnfamily in a keyspace which has RF=3, I'm issuing writes with
ConsistencyLevel.ONE.
in the configuration I have:
- memtable_flush_after_mins : 30
- memtable_throughput_in_mb : 32
I'm writing to this columnfamily continuously for about 1 hour then stop
writing.
So the question
If I'm not mistaken cassandra has been providing support for keyrange
queries also on RP.
However when I try to define a keyrange such as, start: (key100, end:
key200) I get an error like:
InvalidRequestException(why:start key's md5 sorts after end key's md5. this
is not allowed; you probably
Hi All,
In the current project I'm working on. I have use case for hourly analyzing
the rows.
Since the 0.7x branch supports creating and dropping columnfamilies on the
fly;
My use case proposal will be:
* Create a CF at the very beginning of every hour
* At the end of the 1-hour period,
Hi,
In order to continue on memory optimizations, I've been trying to use the
JNA. However, when I copy the jna.jar to the lib directory? I get the
warning. I'm currently running the 0.6.5 version of cassandra.
WARN [main] 2010-10-08 09:16:18,924 FBUtilities.java (line 595) Unknown
mlockall
I'm running an Ubuntu 9.10 linux box.
On Fri, Oct 8, 2010 at 11:33 AM, Roger Schildmeijer
schildmei...@gmail.comwrote:
On Fri, Oct 8, 2010 at 11:27 AM, Utku Can Topçu u...@topcu.gen.tr wrote:
Hi,
In order to continue on memory optimizations, I've been trying to use the
JNA. However, when
that mlockall
error 0.
Maybe there is another solution anyway.
nico008
On 08/10/2010 11:33, Roger Schildmeijer wrote:
On Fri, Oct 8, 2010 at 11:27 AM, Utku Can Topçu u...@topcu.gen.tr wrote:
Hi,
In order to continue on memory optimizations, I've been trying to use the
JNA. However
Hi Oleg,
I've been also looking into these after some research.
I've been tacking with:
1. Setting the default max and min heap from 1G to 1500M.
2. I'm not using row caches, and the key caches are set to 1000, before they
were 200K as default
3. I've lowered the memtable throughput to 32MB
4.
.
On Mon, Oct 4, 2010 at 8:48 AM, Utku Can Topçu u...@topcu.gen.tr wrote:
Hi Jonathan,
Thank you for mentioning about the expiring columns issue. I didn't know
that it had existed.
That's really great news.
First of all, does the current 0.6 branch support it? If not so, is the
patch
.
On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu u...@topcu.gen.tr wrote:
Hey All,
I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are
formed in such a fashion that, they are indexed in descending order by
time.
So I'll be analyzing the data for every hour iteratively
Hey All,
Recently I've tried to upgrade (hw upgrade) one of the nodes in my cassandra
cluster from ec2-small to ec2-large.
However, there were problems and since the IP of the new instance was
different from the previous instance. The other nodes didnot recognize it in
the ring.
So what should
Hi All,
We're currently running a cassandra cluster with Replication Factor 3,
consisting of 4 nodes.
The current situation is:
- The nodes are all identical (AWS small instances)
- Data directory is in the partition (/mnt) which has 150G capacity and each
node has around 90 GB load, so 60 G
Hi All,
I'm planning to use the current 0.6.4 stable for creating an image that
would be the base for nodes in our Cassandra cluster.
However, the 0.6.5 release is on the way. When the 0.6.5 has been released.
Is it possible to have some of the nodes stay in 0.6.4 and having new nodes
in 0.6.5?
Hi All,
I was browsing through the Lucene JIRA and came across the issue named A
Column-Oriented Cassandra-Based Lucene Directory at
https://issues.apache.org/jira/browse/LUCENE-2456
Has anyone had a chance to test it? If so, do you think it's an efficient
implementation as a replacement for the
Hey Guys,
I've been into designing an application which consists of more than 20
ColumnFamily's.
Each ColumnFamily has some columns referencing to keys in other
ColumnFamily's,
some keys in ColumnFamily are combination of keys/columns in other
ColumnFamily's.
I guess most of the people are
Hey Guys,
Currently in a project I'm involved in, I need to have some columns holding
incremented data.
The easy approach for implementing a counter with increments is right now as
I figured out is read - increment - insert however this approach is not
an atomic operation and can easily be
Hey All,
First of all I'll start with some questions on the default behavior of
get_range_slices method defined in the thrift API.
Given a keyrange with start-key kstart and end-key kend, assuming
kstartkend;
* Is it true that I'll get the range [kstart,kend) (kstart inclusive, kend
exclusive)?
Hi Jeremy,
Why are you using Cassandra versus using data stored in HDFS or HBase?
- I'm thinking of using it for realtime streaming of user data. While
streaming the requests, I'm also using Lucandra for indexing the data in
realtime. It's a better option when you compare it with HBase or the
What makes cassandra a poor choice is the fact that, you can't use a
keyrange as input for the map phase for Hadoop.
On Wed, May 12, 2010 at 4:37 PM, Jonathan Ellis jbel...@gmail.com wrote:
On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati
paulogpoi...@gmail.com wrote:
- First of all,
Hey All,
I have a simple sample use case,
The aim is to export the columns in a column family into flat files with the
keys in range from k1 to k2.
Since all the nodes in the cluster is supposed to contain some of the
distribution of data, is it possible to make each node dump its own local
data
Hey All,
I've been looking at the documentation and related articles about Cassandra
and Hadoop integration, I'm only seeing ColumnFamilyInputFormat for now.
What if I want to write directly to cassandra after a reduce?
What comes to my mind is, in the Reducer's setup I'd initialize a Cassandra
at 3:22 PM, Jonathan Ellis jbel...@gmail.com wrote:
Sounds like doing this w/o m/r with get_range_slices is a reasonable way to
go.
On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
I'm currently writing collected data continuously to Cassandra, having
keys
starting
I meant in the first sentence running the get_range_slices from a single
point
On Fri, Apr 30, 2010 at 4:08 PM, Utku Can Topçu u...@topcu.gen.tr wrote:
Do you mean, running the get_range_slices from a single? Yes, it would be
reasonable for a relatively small key range, when it comes
Hey All,
I'm trying to run some tests on cassandra an Hadoop integration. I'm
basically following the word count example at
https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.javausing
the ColumnFamilyInputFormat.
Currently I have one-node cassandra and hadoop
, 2010 at 11:32 PM, Jonathan Ellis jbel...@gmail.com wrote:
It's technically possible but 0.6 does not support this, no.
What is the use case you are thinking of?
On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu u...@topcu.gen.tr
wrote:
Hi,
I've been trying to use Cassandra for some kind
49 matches
Mail list logo