column sizes (was: online codes (?))

2010-02-03 Thread Ted Zlatanov
On Tue, 2 Feb 2010 23:05:04 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE The atom in cassandra is a single column. These are almost always JE under 1KB. Is there any point to storing large objects (over 100MB) in Cassandra columns? I'm considering it but it seems like a bad idea based on

read performance very bad with 1M keys, 600 columns per key.

2010-02-03 Thread envio user
Hello Developers, I originally posted this message to cassandra-u...@incubator.apache.org and want to find my tests are ok or not. I did this tests after seeing Jonathan's blog(http://spyced.blogspot.com/) and I used same stress.py to do tests. H/W: Single node, Quad Core(8 cores), 8GB RAM: Two

Re: read performance very bad with 1M keys, 600 columns per key.

2010-02-03 Thread Daniel Lundin
On 2010-02-03 14:31, envio user wrote: /home/sunpython stress.py -n 100 -t 100 -c 25 -r -o read -i 10 WARNING: multiprocessing not present, threading will be used. Benchmark may not be accurate! You should make sure multiprocessing is installed and in use for stress.py, otherwise

Re: bitmap slices

2010-02-03 Thread Ted Zlatanov
On Mon, 1 Feb 2010 11:14:12 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE 2010/2/1 Ted Zlatanov t...@lifelogs.com: On Mon, 1 Feb 2010 10:41:28 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE I don't think this is very useful for column names.  I could see it JE being useful for values but

heap memory

2010-02-03 Thread Suhail Doshi
Something I've been monitoring lately is heap memory because it can cause a lot of problems if you hit the JVM max memory limit. Usually what happens when you do is GarbageCollection will run when it hits the max memory and it will peg one of the cores on the CPU to 100%. That causes clients who

Re: heap memory

2010-02-03 Thread Suhail Doshi
Another interesting thing i am seeing is how the heap memory just drops (which I think is due to garbage collection but not certain). If you hit garbage collection manually it will peg the CPU and drop the heap memory to a lower which is why i think it is. In the attached picture the heap memory

Re: column sizes (was: online codes (?))

2010-02-03 Thread Michael Pearson
I'd imagine the gossip overhead and key/column per disk limitation is too open for abuse to recommend storing lob columns with any level of predictability, particularly if frequent updates are involved. Would you say it's generally better form to store manifests or file pointers only, and send

Re: column sizes (was: online codes (?))

2010-02-03 Thread Michael Pearson
Thanks for the Gossip note, I'll keep reading up on the protocols. For key/column/disk I meant in terms of the Cassandra limitation - The main limitation on column and supercolumn size is that all data for a single key and column must fit (on disk) on a single machine in the cluster. Is it right

Re: column sizes (was: online codes (?))

2010-02-03 Thread Jonathan Ellis
That's correct. On Wed, Feb 3, 2010 at 4:49 PM, Michael Pearson mjpear...@gmail.com wrote: Thanks for the Gossip note, I'll keep reading up on the protocols. For key/column/disk I meant in terms of the Cassandra limitation - The main limitation on column and supercolumn size is that all data

Re: bitmap slices

2010-02-03 Thread Jonathan Ellis
It seems to me that the bitmask is only really useful for the SliceRange predicate. Doing a predicate of fetch these column names, but only if they match this mask seems strange. The mask check needs to be done in the Slice Filter, not SP. Is this actually powerful enough to solve a real