Memory use grows extremely fast with super column families
----------------------------------------------------------

                 Key: CASSANDRA-1230
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1230
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.6.2
         Environment: Single node Ubuntu 10.04 64 bit, sun-java6 from partner 
repositories, using pycassa 0.3.0 to insert events.
            Reporter: Heikki Toivonen
            Priority: Critical


I have a script that inserts about 1kB of key/values into 10k super columns 
each into 1k rows. Or at least I tried to. I noticed that Cassandra's memory 
usage went up so fast that I was only able to insert into a few dozen rows 
before my machine run out of memory. When I use regular column families 
Cassandra's memory usage seems pretty flat, so this seems an issue specifically 
with super columns.

Here's the test program:

#!/usr/bin/env python
# Program to demonstrate a use case where Cassandra memory usage grows
# without bounds using super column family:
#  -  1 row  140 MB RES 1400 MB VIRT
#  -  5 rows 532        1600
#  - 10      580        1632
#  - 20      801        1775
#  - 40      958        2047
#  ...
#
# Stopping Cassandra and restarting makes it jump immediately to the same
# virtual memory usage. Resident memory size seems to be about
# half of the state prior to stopping.
# 
# _JAVA_OPTIONS: -Xms64m -Xmx1G
# Cassandra 0.6.2 with default storage-conf.xml on single node
# Ubuntu 10.04 64bit
# sun-java6
# pycassa 0.3.0

import uuid

import pycassa

def insert10k(cf, rowkey):
    for i in xrange(10000):
        cf.insert(rowkey, {
                str(i): {
                    "abcdefghijklmnopqrstuvwxyz":'1234567890',
                    "bbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "cbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "dbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "ebcdefghijklmnopqrstuvwxyz":'1234567890',
                    "fbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "gbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "hbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "ibcdefghijklmnopqrstuvwxyz":'1234567890',
                    "jbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "kbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "lbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "mbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "nbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "obcdefghijklmnopqrstuvwxyz":'1234567890',
                    "pbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "qbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "rbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "sbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "tbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "ubcdefghijklmnopqrstuvwxyz":'1234567890',
                    "vbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "wbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "xbcdefghijklmnopqrstuvwxyz":'1234567890',
                    "ybcdefghijklmnopqrstuvwxyz":'1234567890',
                    "zbcdefghijklmnopqrstuvwxyz":'1234567890',
                    },
                })    

def super_column():
    client = pycassa.connect()
    cf = pycassa.ColumnFamily(client, 'Keyspace1', 'Super1', super=True)

    i = 0
    while i < 1000:
        insert10k(cf, uuid.uuid4().hex)
        print i, 'inserted 10k'
        i += 1

if __name__ == '__main__':
    super_column()


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to