from:"Franc Carter"

Re: datastax community ami -- broken? --- datastax-agent conflicts with opscenter-agent

2013-12-06 Thread Franc Carter

Hi Joaquin,

A quick word of praise - addressing the issue so quickly presents a really
good view of Datastax

cheers



On Sat, Dec 7, 2013 at 8:14 AM, Joaquin Casares joaq...@datastax.comwrote:

 Hello again John,

 The AMI has been patched and tested for both DSE and C* and works for the
 standard 3 node test. The new code has been pushed to the 2.4 branch so
 launching a new set of instances will give you an updated AMI.

 You should now have the newest version of OpsCenter installed, along with
 the new DataStax Agents (that replace the OpsCenter Agents). Also, I've
 patched the two bugs for the motd and for allowing the other nodes to join.

 The issue came from a new release of nodetool that contained some
 unexpected text that choked up the AMI code as it waited for nodes to come
 online.

 Let me know if you see any further issues.

 Thanks,

 Joaquin Casares
 DataStax
 Software Engineer in Test



 http://www.datastax.com/what-we-offer/products-services/training/virtual-training


 On Fri, Dec 6, 2013 at 2:02 PM, Joaquin Casares joaq...@datastax.comwrote:

 Hey John,

 Thanks for letting us know. I'm also seeing that the motd gets stuck, but
 if I ctrl-c during the message and try a `nodetool status` there doesn't
 appear to be an issue.

 I'm currently investigating why it's getting stuck. Are you seeing
 something similar?

 What happens if you try to run a `sudo service cassandra restart`? Could
 you send me your /var/log/cassandra/system.log if this still fails?

 Also, I realize now that the package name for the newest version of
 OpsCenter changed from opscenter-free to opscenter. I committed that change
 to our dev AMI and am testing it now. Once this change is made you will no
 longer have to install agents via OpsCenter since they should already be on
 the system. That being said, you won't hit the current OpsCenter/DataStax
 Agent version mismatch you've been hitting.

 Also, we currently only have one AMI. Each time an instance is launched
 the newest version of the code is pulled down from
 https://github.com/riptano/ComboAMI to ensure the code never gets stale
 and can easily keep up with DSE/C* releases as well as AMI code fixes.

 I'll reply again as soon as I figure out and patch this motd issue.

 Thanks,

 Joaquin Casares
 DataStax
 Software Engineer in Test



 http://www.datastax.com/what-we-offer/products-services/training/virtual-training


 On Fri, Dec 6, 2013 at 7:16 AM, John R. Frank j...@mit.edu wrote:

 Hi C* experts,

 In the last 18hrs or so, I have been having trouble getting cassandra
 instances to launch using the datastax community AMI.  Has anyone else seen
 this?

 The instance comes up but then cassandra fails to run.  The most
 informative error message that I've seen so far is in the obscenter agent
 install log (below) --- see especially this line:

 datastax-agent conflicts with opscenter-agent

 I have also attached the ami.log

 I run into these issues when launching with either the browser page:
 https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2

 Or command line, like this:

 ec2-run-instances ami-814ec2e8 -t m1.large --region us-east-1 --key
 buildbot -n $num_instances -d --clustername Foo --totalnodes
 $num_instances  --version community  --opscenter yes


 Is there a newer AMI?


 Any advice?
 jrf




 Some agent installations failed:

 - 10.6.133.241: Failure installing agent on 10.6.133.241.
 Error output:
 Unable to install the opscenter-agent package. Please check your apt-get
 configuration as well as the agent install log (/var/log/opscenter-agent/
 installer.log).

 Standard output:
 Removing old opscenter-agent files.
 opscenter-agent: unrecognized service
 Reading package lists...
 Building dependency tree...
 Reading state information...
 0 upgraded, 0 newly installed, 0 to remove and 171 not upgraded.
 Starting agent installation process for version 3.2.2
 Reading package lists...
 Building dependency tree...
 Reading state information...
 sysstat is already the newest version.
 0 upgraded, 0 newly installed, 0 to remove and 171 not upgraded.
 Selecting previously unselected package opscenter-agent.
 dpkg: regarding .../opscenter-agent.deb containing opscenter-agent:
 datastax-agent conflicts with opscenter-agent
 opscenter-agent (version 3.2.2) is to be installed.
 opscenter-agent provides opscenter-agent and is to be installed.
 dpkg: error processing opscenter_agent_setup.vYRzL0Tevn/opscenter-agent.deb
 (--install):
 conflicting packages - not installing opscenter-agent
 Errors were encountered while processing:
 opscenter_agent_setup.vYRzL0Tevn/opscenter-agent.deb
 FAILURE: Unable to install the opscenter-agent package. Please check
 your apt-get configuration as well as the agent install log
 (/var/log/opscenter-agent/installer.log).

 Exit code: 1






-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4

Re: How would you model that?

2013-11-08 Thread Franc Carter

How about something like using a time-range as the key (e.g an hour
depending on your update rate) and a composite (time:user)  as the column
name

cheers



On Fri, Nov 8, 2013 at 10:45 PM, Laing, Michael
michael.la...@nytimes.comwrote:

 You could try this:

 CREATE TABLE user_activity (shard text, user text, ts timeuuid, primary
 key (shard, ts));

 select user, ts from user_activity where shard in ('00', '01', ...) order
 by ts desc;

 Grab each user and ts the first time you see that user.

 Use as many shards as you think you need to control row size and spread
 the load.

 Set ttls to expire user_activity entries when you are no longer interested
 in them.

 ml


 On Fri, Nov 8, 2013 at 6:10 AM, pavli...@gmail.com pavli...@gmail.comwrote:

 Hey guys, I need to retrieve a list of distinct users based on their
 activity datetime. How can I model a table to store that kind of
 information?

 The straightforward decision was this:

 CREATE TABLE user_activity (user text primary key, ts timeuuid);

 but it turned out it is impossible to do a select like this:

 select * from user_activity order by ts;

 as it fails with ORDER BY is only supported when the partition key is
 restricted by an EQ or an IN.

 How would you model the thing? Just need to have a list of users based on
 their last activity timestamp...

 Thanks!





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Storage management during rapid growth

2013-10-31 Thread Franc Carter

I can't comment on the technical question, however one thing I learnt with
managing the growth of data is that the $/GB of tends to drop at a rate
that can absorb a moderate proportion of the  increase in cost due to the
increase in size of data. I'd recommend having a wet-finger-in-the-air stab
at projecting the growth in data sizes versus the historical trends in the
decease in cost of storage.

cheers



On Fri, Nov 1, 2013 at 7:15 AM, Dave Cowen d...@luciddg.com wrote:

 Hi, all -

 I'm currently managing a small Cassandra cluster, several nodes with local
 SSD storage.

 It's difficult for to forecast the growth of the Cassandra data over the
 next couple of years for various reasons, but it is virtually guaranteed to
 grow substantially.

 During this time, there may be times where it is desirable to increase the
 amount of storage available to each node, but, assuming we are not I/O
 bound, keep from expanding the cluster horizontally with additional nodes
 that have local storage. In addition, expanding with local SSDs is costly.

 My colleagues and I have had several discussions of a couple of other
 options that don't involve scaling horizontally or adding SSDs:

 1) Move to larger, cheaper spinning-platter disks. However, when
 monitoring the performance of our cluster, we see sustained periods -
 especially during repair/compaction/cleanup - of several hours where there
 are 2000 IOPS. It will be hard to get to that level of performance in each
 node with spinning platter disks, and we'd prefer not to take that kind of
 performance hit during maintenance operations.

 2) Move some nodes to a SAN solution, ensuring that there is a mix of
 storage, drives, LUNs and RAIDs so that there isn't a single point of
 failure. While we're aware that this is frowned on in the Cassandra
 community due to Cassandra's design, a SAN seems like the obvious way of
 being able to quickly add storage to a cluster without having to juggle
 local drives, and provides a level of performance between local spinning
 platter drives and local SSDs.

 So, the questions:

 1) Has anyone moved from SSDs to spinning-platter disks, or managed a
 cluster that contained both? Do the numbers we're seeing exaggerate the
 performance hit we'd see if we moved to spinners?

 2) Have you successfully used a SAN or a hybrid SAN solution (some local,
 some SAN-based) to dynamically add storage to the cluster? What type of SAN
 have you used, and what issues have you run into?

 3) Am I missing a way of economically scaling storage?

 Thanks for any insight.

 Dave




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Recommended hardware

2013-09-24 Thread Franc Carter

Far from expert opinion, however one configuration I have seen talked about
is 3 x m1.xlarge in AWS.

I have tested 4 x m1.xlarge and 4  x m1.large. The m1.xlarge was fine for
out tests (we were hitting it pretty hard), the m1.large
was eratic - from that I took way that you either need to give Cassandra
sufficient resources or know how to tune properly (I don't)

cheers


On Tue, Sep 24, 2013 at 2:17 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hello,

 I am running Cassandra 2.0 on a 2gb memory 10 gb HD in a virtual cloud
 environment. It's supporting a php application running on the same node.

 Mostly this instance runs smoothly but runs low on memory. Depending on
 how much the site is used, the VM will swap out sometimes excessively.

 I realize this setup may not be enough to support a cassandra instance.

 I was wondering if there were any recommended hardware specs someone
 could point me to for both physical and virtual (cloud) type environments.

 Thank you,
 Tim
 Sent from my iPhone




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: cassandra just gone..no heap dump, no log info

2013-09-18 Thread Franc Carter

A random guess - possibly an OOM (Out of Memory) where Linux will kill a
process to recover memory when it is desperately low on memory. Have a look
in either your syslog output of the output of dmesg

cheers


On Wed, Sep 18, 2013 at 10:21 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Anyone know how to debug cassandra processes just exiting?  There is no
 info in the cassandra logs and there is no heap dump file(which in the past
 has shown up in /opt/cassandra/bin directory for me).

 This occurs when running a map/reduce job that put severe load on the
 system.  The logs look completely fine.  I find it odd

  1.  No logs of why it exited at all
  2.  No heap dump which would imply there would be no logs as it crashed

 Is there any other way a process can die and linux would log it somehow?
  (like running out of memory)

 Thanks,
 Dean




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Question about 'duplicate' columns

2013-08-06 Thread Franc Carter

I've been thinking through some cases that I can see happening at some
point and thought I'd ask on the list to see if my understanding is correct.

Say a bunch of columns have been loaded 'a long time ago', i.e long enough
in the past that they have been compacted. My understanding is that if some
these columns get reloaded then they are likely to sit in additional
sstables until the larger sstable is called up for compaction, which might
be a while.

The case that springs to mind is filling small gaps in data by doing bulk
loads around the gap to make sure that the gap is filled.

Have I understood correctly ?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Question about 'duplicate' columns

2013-08-06 Thread Franc Carter

On Tue, Aug 6, 2013 at 6:10 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Yes. If you overwrite much older data with new data both versions of the
 column will remain on disk until compaction get's to work on both fragments
 of the row.


thanks



 Cheers

  -
 Aaron Morton
 Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 6/08/2013, at 6:48 PM, Franc Carter franc.car...@sirca.org.au wrote:


 I've been thinking through some cases that I can see happening at some
 point and thought I'd ask on the list to see if my understanding is correct.

 Say a bunch of columns have been loaded 'a long time ago', i.e long enough
 in the past that they have been compacted. My understanding is that if some
 these columns get reloaded then they are likely to sit in additional
 sstables until the larger sstable is called up for compaction, which might
 be a while.

 The case that springs to mind is filling small gaps in data by doing bulk
 loads around the gap to make sure that the gap is filled.

 Have I understood correctly ?

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: schema management

2013-07-03 Thread Franc Carter

On Wed, Jul 3, 2013 at 2:06 AM, Silas Smith silas.sm...@gmail.com wrote:

 Franc,
 We manage our schema through the Astyanax driver. It runs in a listener at
 application startup. We read a self-defined schema version, update the
 schema if needed based on the version number, and then write the new schema
 version number. There is a chance two or more app servers will try to
 update the schema at the same time but in our testing we haven't seen any
 problems result from this even when we forced many servers to all update
 the schema with many different updates at the same time. And besides we
 typically do a rolling restart anyway.

 Todd,
 Mutagen Cassandra looks pretty similar to what we're doing, but is perhaps
 a bit more elegant. Will take a look at that now :)

 Cheers



Thanks all,

I'll likely stick to cassandra-cli scripts for this project and then look
in to Cassandra-Mutagen

cheers




 On Mon, Jul 1, 2013 at 5:55 PM, Franc Carter franc.car...@sirca.org.auwrote:

 On Tue, Jul 2, 2013 at 10:33 AM, Todd Fast t...@digitalexistence.comwrote:

 Franc--

 I think you will find Mutagen Cassandra very interesting; it is similar
 to schema management tools like Flyway for SQL databases:


 Oops - forgot to mention in my original email that we will be looking
 into Mutagen Cassandra in the medium term. I'm after something with a low
 barrier to entry initially as we are quite time constrained.

 cheers



 Mutagen Cassandra is a framework (based on Mutagen) that provides schema
 versioning and mutation for Apache Cassandra.

 Mutagen is a lightweight framework for applying versioned changes
 (known as mutations) to a resource, in this case a Cassandra schema.
 Mutagen takes into account the resource's existing state and only applies
 changes that haven't yet been applied.

 Schema mutation with Mutagen helps you make manageable changes to the
 schema of live Cassandra instances as you update your software, and is
 especially useful when used across development, test, staging, and
 production environments to automatically keep schemas in sync.



 https://github.com/toddfast/mutagen-cassandra

 Todd


 On Mon, Jul 1, 2013 at 5:23 PM, sankalp kohli kohlisank...@gmail.comwrote:

 You can generate schema through the code. That is also one option.


 On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter franc.car...@sirca.org.au
  wrote:


 Hi,

 I've been giving some thought to the way we deploy schemas and am
 looking for something better than out current approach, which is to use
 cassandra-cli scripts.

 What do people use for this ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215







 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

schema management

2013-07-01 Thread Franc Carter

Hi,

I've been giving some thought to the way we deploy schemas and am looking
for something better than out current approach, which is to use
cassandra-cli scripts.

What do people use for this ?

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: schema management

2013-07-01 Thread Franc Carter

On Tue, Jul 2, 2013 at 10:33 AM, Todd Fast t...@digitalexistence.comwrote:

 Franc--

 I think you will find Mutagen Cassandra very interesting; it is similar to
 schema management tools like Flyway for SQL databases:


Oops - forgot to mention in my original email that we will be looking into
Mutagen Cassandra in the medium term. I'm after something with a low
barrier to entry initially as we are quite time constrained.

cheers



 Mutagen Cassandra is a framework (based on Mutagen) that provides schema
 versioning and mutation for Apache Cassandra.

 Mutagen is a lightweight framework for applying versioned changes (known
 as mutations) to a resource, in this case a Cassandra schema. Mutagen takes
 into account the resource's existing state and only applies changes that
 haven't yet been applied.

 Schema mutation with Mutagen helps you make manageable changes to the
 schema of live Cassandra instances as you update your software, and is
 especially useful when used across development, test, staging, and
 production environments to automatically keep schemas in sync.



 https://github.com/toddfast/mutagen-cassandra

 Todd


 On Mon, Jul 1, 2013 at 5:23 PM, sankalp kohli kohlisank...@gmail.comwrote:

 You can generate schema through the code. That is also one option.


 On Mon, Jul 1, 2013 at 4:10 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:


 Hi,

 I've been giving some thought to the way we deploy schemas and am
 looking for something better than out current approach, which is to use
 cassandra-cli scripts.

 What do people use for this ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215







-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: crashed while running repair

2013-06-22 Thread Franc Carter

On Sat, Jun 22, 2013 at 11:21 AM, sankalp kohli kohlisank...@gmail.comwrote:

 Looks like memory map failed. In a 64 bit system, you should have
 unlimited virtual memory but Linux has a limit on the number of maps. Looks
 at these two places.


 http://stackoverflow.com/questions/8892143/error-when-opening-a-lucene-index-map-failed

 https://blog.kumina.nl/2011/04/cassandra-java-io-ioerror-java-io-ioexception-map-failed/


That sounds very plausible, I have a CF with a very large number of files
as I used the default sstable_size_in_mb, I'm following another thread on
how to recover from that.

cheers







 On Fri, Jun 21, 2013 at 3:22 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:


 Hi,

 I am experimenting with Cassandra-1.2.4, and got a crash while running
 repair. The nodes has 24GB of ram with an 8GB heap. Any ideas on my I may
 have missed in the config ? Log is below

 ERROR [Thread-136019] 2013-06-22 06:30:05,861 CassandraDaemon.java (line
 174) Exception in thread Thread[Thread-136019,5,main]
 FSReadError in
 /var/lib/cassandra/data/cut3/Price/cut3-Price-ib-44369-Index.db
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:200)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:340)
 at
 org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319)
 at
 org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:194)
 at
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
 at
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238)
 at
 org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178)
 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
 Caused by: java.io.IOException: Map failed
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
 at
 org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:192)
 ... 8 more
 Caused by: java.lang.OutOfMemoryError: Map failed
 at sun.nio.ch.FileChannelImpl.map0(Native Method)
 at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
 ... 9 more
 ERROR [Thread-136019] 2013-06-22 06:30:05,865 FileUtils.java (line 375)
 Stopping gossiper


 thanks

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Updated sstable size for LCS, ran upgradesstables, file sizes didn't change

2013-06-22 Thread Franc Carter

On Sat, Jun 22, 2013 at 10:42 AM, Wei Zhu wz1...@yahoo.com wrote:

 I think the new SSTable will be in the new size. In order to do that, you
 need to trigger a compaction so that the new SSTables will be generated.
 for LCS, there is no major compaction though. You can run a nodetool repair
 and hopefully you will bring some new SSTables and compactions will kick in.
 Or you can change the $CFName.json file under your data directory and move
 every SSTable to level 0. You need to stop your node,  write a simple
 script to alter that file and start the node again.

 I think it will be helpful to have a nodetool command to change the
 SSTable Size and trigger the rebuild of the SSTables.



I'd find that useful as well

cheers



 Thanks.
 -Wei

 --
 *From: *Robert Coli rc...@eventbrite.com
 *To: *user@cassandra.apache.org
 *Sent: *Friday, June 21, 2013 4:51:29 PM
 *Subject: *Re: Updated sstable size for LCS, ran upgradesstables, file
 sizes didn't change


 On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki
 andrew.biale...@gmail.com wrote:
  However when we run alter the column
  family and then run nodetool upgradesstables -a keyspace columnfamily,
 the
  files in the data directory have been re-written, but the file sizes are
 the
  same.
 
  Is this the expected behavior? If not, what's the right way to upgrade
 them.
  If this is expected, how can we benchmark the read/write performance with
  varying sstable sizes.

 It is expected, upgradesstables/scrub/clean compactions work on a
 single sstable at a time, they are not capable of combining or
 splitting them.

 In theory you could probably :

 1) start out with the largest size you want to test
 2) stop your node
 3) use sstable_split [1] to split sstables
 4) start node, test
 5) repeat 2-4

 I am not sure if there is anything about level compaction which makes
 this infeasible.

 =Rob
 [1] https://github.com/pcmanus/cassandra/tree/sstable_split




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Compaction not running

2013-06-21 Thread Franc Carter

On Fri, Jun 21, 2013 at 6:16 PM, aaron morton aa...@thelastpickle.comwrote:

 Do you think it's worth posting an issue, or not enough traceable evidence
 ?

 If you can reproduce it then certainly file a bug.


I'll keep my eye on it to see if it happens again and there is a pattern

cheers



 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/06/2013, at 9:41 PM, Franc Carter franc.car...@sirca.org.au wrote:

 On Thu, Jun 20, 2013 at 7:27 PM, aaron morton aa...@thelastpickle.comwrote:

 nodetool compactionstats, gives

 pending tasks: 13120

 If there are no errors in the log, I would say this is a bug.


 This happened after the node ran out of file descriptors, so an edge case
 wouldn't surprise me.

 I've rebuilt the node (blown the data way and am running a nodetool
 rebuild). Do you think it's worth posting an issue, or not enough traceable
 evidence ?

 cheers



 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 19/06/2013, at 11:41 AM, Franc Carter franc.car...@sirca.org.au
 wrote:

 On Wed, Jun 19, 2013 at 9:34 AM, Bryan Talbot btal...@aeriagames.comwrote:

 Manual compaction for LCS doesn't really do much.  It certainly doesn't
 compact all those little files into bigger files.  What makes you think
 that compactions are not occurring?


 Yeah, that's what I thought, however:-

 nodetool compactionstats, gives

 pending tasks: 13120
Active compaction remaining time :n/a

 when I run nodetool compact in a loop the pending tasks goes down
 gradually.

 This node also has vastly higher latencies (x10) than the other nodes. I
 saw this with a previous CF than I 'manually compacted', and when the
 pending tasks reached low numbers (stuck on 9) then latencies were back to
 low milliseconds

 cheers


 -Bryan



 On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter franc.car...@sirca.org.au
  wrote:

 On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter 
 franc.car...@sirca.org.au wrote:

 On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.comwrote:

 On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter 
 franc.car...@sirca.org.au wrote:
  We are running a test system with Leveled compaction on
 Cassandra-1.2.4.
  While doing an initial load of the data one of the nodes ran out of
 file
  descriptors and since then it hasn't been automatically compacting.

 You have (at least) two options :

 1) increase file descriptors available to Cassandra with ulimit, if
 possible
 2) increase the size of your sstables with levelled compaction, such
 that you have fewer of them


 Oops, I wasn't clear enough.

 I have increased the number of file descriptors and no longer have a
 file descriptor issue. However the node still doesn't compact
 automatically. If I run a 'nodetool compact' it will do a small amount of
 compaction and then stop. The Column Family is using LCS


 Any ideas on this - compaction is still not automatically running for
 one of my nodes

 thanks



 cheers



 =Rob




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

crashed while running repair

2013-06-21 Thread Franc Carter

Hi,

I am experimenting with Cassandra-1.2.4, and got a crash while running
repair. The nodes has 24GB of ram with an 8GB heap. Any ideas on my I may
have missed in the config ? Log is below

ERROR [Thread-136019] 2013-06-22 06:30:05,861 CassandraDaemon.java (line
174) Exception in thread Thread[Thread-136019,5,main]
FSReadError in
/var/lib/cassandra/data/cut3/Price/cut3-Price-ib-44369-Index.db
at
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:200)
at
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.complete(MmappedSegmentedFile.java:168)
at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:340)
at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:319)
at
org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:194)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
at
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238)
at
org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748)
at
org.apache.cassandra.io.util.MmappedSegmentedFile$Builder.createSegments(MmappedSegmentedFile.java:192)
... 8 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
... 9 more
ERROR [Thread-136019] 2013-06-22 06:30:05,865 FileUtils.java (line 375)
Stopping gossiper


thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Compaction not running

2013-06-20 Thread Franc Carter

On Thu, Jun 20, 2013 at 7:27 PM, aaron morton aa...@thelastpickle.comwrote:

 nodetool compactionstats, gives

 pending tasks: 13120

 If there are no errors in the log, I would say this is a bug.


This happened after the node ran out of file descriptors, so an edge case
wouldn't surprise me.

I've rebuilt the node (blown the data way and am running a nodetool
rebuild). Do you think it's worth posting an issue, or not enough traceable
evidence ?

cheers



 Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 19/06/2013, at 11:41 AM, Franc Carter franc.car...@sirca.org.au
 wrote:

 On Wed, Jun 19, 2013 at 9:34 AM, Bryan Talbot btal...@aeriagames.comwrote:

 Manual compaction for LCS doesn't really do much.  It certainly doesn't
 compact all those little files into bigger files.  What makes you think
 that compactions are not occurring?


 Yeah, that's what I thought, however:-

 nodetool compactionstats, gives

 pending tasks: 13120
Active compaction remaining time :n/a

 when I run nodetool compact in a loop the pending tasks goes down
 gradually.

 This node also has vastly higher latencies (x10) than the other nodes. I
 saw this with a previous CF than I 'manually compacted', and when the
 pending tasks reached low numbers (stuck on 9) then latencies were back to
 low milliseconds

 cheers


 -Bryan



 On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:

 On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter 
 franc.car...@sirca.org.au wrote:

 On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.comwrote:

 On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter 
 franc.car...@sirca.org.au wrote:
  We are running a test system with Leveled compaction on
 Cassandra-1.2.4.
  While doing an initial load of the data one of the nodes ran out of
 file
  descriptors and since then it hasn't been automatically compacting.

 You have (at least) two options :

 1) increase file descriptors available to Cassandra with ulimit, if
 possible
 2) increase the size of your sstables with levelled compaction, such
 that you have fewer of them


 Oops, I wasn't clear enough.

 I have increased the number of file descriptors and no longer have a
 file descriptor issue. However the node still doesn't compact
 automatically. If I run a 'nodetool compact' it will do a small amount of
 compaction and then stop. The Column Family is using LCS


 Any ideas on this - compaction is still not automatically running for
 one of my nodes

 thanks



 cheers



 =Rob




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 8355 2514
  Level 4, 55 Harrington St, The Rocks NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Performance Difference between Cassandra version

2013-06-19 Thread Franc Carter

On Thu, Jun 20, 2013 at 9:18 AM, Raihan Jamal jamalrai...@gmail.com wrote:

 I am trying to see whether there will be any performance difference
 between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly?

 Has anyone seen any major performance difference?


We are part way through a performance comparison between 1.0.9 with Size
Tiered Compaction and 1.2.4 with Leveled Compaction - for our use case it
looks like a significant performance improvement on the read side.  We are
finding compaction lags when we do very large bulk loads, but for us this
is an initialisation task and that's a reasonable trade-off

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Large number of files for Leveled Compaction

2013-06-18 Thread Franc Carter

On Mon, Jun 17, 2013 at 3:37 PM, Franc Carter franc.car...@sirca.org.auwrote:

 On Mon, Jun 17, 2013 at 3:28 PM, Wei Zhu wz1...@yahoo.com wrote:

 default value of 5MB is way too small in practice. Too many files in one
 directory is not a good thing. It's not clear what should be a good number.
 I have heard people are using 50MB, 75MB, even 100MB. Do your own test o
 find a right number.


 Interesting - 50MB is the low end of what people are using - 5MB is a lot
 lower. I'll try a 50MB set


Oops, forgot to ask - is there a way to get Cassandra to rebuild the
sstables as bigger once I have updated the column family definition ?

thanks



 cheers


 -Wei

 --
 *From: *Franc Carter franc.car...@sirca.org.au
 *To: *user@cassandra.apache.org
 *Sent: *Sunday, June 16, 2013 10:15:22 PM
 *Subject: *Re: Large number of files for Leveled Compaction




 On Mon, Jun 17, 2013 at 2:59 PM, Manoj Mainali mainalima...@gmail.comwrote:

 Not in the case of LeveledCompaction. Only SizeTieredCompaction merges
 smaller sstables into large ones. With the LeveledCompaction, the sstables
 are always of fixed size but they are grouped into different levels.

 You can refer to this page
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra on
 details of how LeveledCompaction works.


 Yes, but it seems I've misinterpreted that page ;-(

 I took this paragraph

 In figure 3, new sstables are added to the first level, L0, and
 immediately compacted with the sstables in L1 (blue). When L1 fills up,
 extra sstables are promoted to L2 (violet). Subsequent sstables generated
 in L1 will be compacted with the sstables in L2 with which they overlap. As
 more data is added, leveled compaction results in a situation like the one
 shown in figure 4.


 to mean that once a level fills up it gets compacted into a higher level

 cheers


 Cheers
 Manoj


 On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter franc.car...@sirca.org.au
  wrote:

 On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali 
 mainalima...@gmail.comwrote:

 With LeveledCompaction, each sstable size is fixed and is defined by
 sstable_size_in_mb in the compaction configuration of CF definition and
 default value is 5MB. In you case, you may have not defined your own 
 value,
 that is why your each sstable is 5MB. And if you dataset is huge, you will
 see a lot of sstable counts.



 Ok, seems like I do have (at least) an incomplete understanding. I
 realise that the minimum size is 5MB, but I thought compaction would merge
 these into a smaller number of larger sstables ?

 thanks


 Cheers

 Manoj


 On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter 
 franc.car...@sirca.org.au wrote:


 Hi,

 We are trialling Cassandra-1.2(.4) with Leveled compaction as it
 looks like it may be a win for us.

 The first step of testing was to push a fairly large slab of data
 into the Column Family - we did this much faster ( x100) than we would 
 in
 a production environment. This has left the Column Family with about
 140,000 files in the Column Family directory which seems way too high. On
 two of the nodes the CompactionStats show 2 outstanding tasks and on a
 third node there are over 13,000 outstanding tasks. However from looking 
 at
 the log activity it looks like compaction has finished on all nodes.

 Is this number of files expected/normal ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Compaction not running

2013-06-18 Thread Franc Carter

On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter franc.car...@sirca.org.auwrote:

 On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter franc.car...@sirca.org.au
 wrote:
  We are running a test system with Leveled compaction on Cassandra-1.2.4.
  While doing an initial load of the data one of the nodes ran out of file
  descriptors and since then it hasn't been automatically compacting.

 You have (at least) two options :

 1) increase file descriptors available to Cassandra with ulimit, if
 possible
 2) increase the size of your sstables with levelled compaction, such
 that you have fewer of them


 Oops, I wasn't clear enough.

 I have increased the number of file descriptors and no longer have a file
 descriptor issue. However the node still doesn't compact automatically. If
 I run a 'nodetool compact' it will do a small amount of compaction and then
 stop. The Column Family is using LCS


Any ideas on this - compaction is still not automatically running for one
of my nodes

thanks



 cheers



 =Rob




 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Compaction not running

2013-06-18 Thread Franc Carter

On Wed, Jun 19, 2013 at 9:34 AM, Bryan Talbot btal...@aeriagames.comwrote:

 Manual compaction for LCS doesn't really do much.  It certainly doesn't
 compact all those little files into bigger files.  What makes you think
 that compactions are not occurring?


Yeah, that's what I thought, however:-

nodetool compactionstats, gives

pending tasks: 13120
   Active compaction remaining time :n/a

when I run nodetool compact in a loop the pending tasks goes down gradually.

This node also has vastly higher latencies (x10) than the other nodes. I
saw this with a previous CF than I 'manually compacted', and when the
pending tasks reached low numbers (stuck on 9) then latencies were back to
low milliseconds

cheers


 -Bryan



 On Tue, Jun 18, 2013 at 3:59 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:

 On Sat, Jun 15, 2013 at 11:49 AM, Franc Carter franc.car...@sirca.org.au
  wrote:

 On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.comwrote:

 On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter 
 franc.car...@sirca.org.au wrote:
  We are running a test system with Leveled compaction on
 Cassandra-1.2.4.
  While doing an initial load of the data one of the nodes ran out of
 file
  descriptors and since then it hasn't been automatically compacting.

 You have (at least) two options :

 1) increase file descriptors available to Cassandra with ulimit, if
 possible
 2) increase the size of your sstables with levelled compaction, such
 that you have fewer of them


 Oops, I wasn't clear enough.

 I have increased the number of file descriptors and no longer have a
 file descriptor issue. However the node still doesn't compact
 automatically. If I run a 'nodetool compact' it will do a small amount of
 compaction and then stop. The Column Family is using LCS


 Any ideas on this - compaction is still not automatically running for one
 of my nodes

 thanks



 cheers



 =Rob




 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Large number of files for Leveled Compaction

2013-06-16 Thread Franc Carter

On Fri, Jun 7, 2013 at 2:44 PM, Franc Carter franc.car...@sirca.org.auwrote:


 Hi,

 We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
 like it may be a win for us.

 The first step of testing was to push a fairly large slab of data into the
 Column Family - we did this much faster ( x100) than we would in a
 production environment. This has left the Column Family with about 140,000
 files in the Column Family directory which seems way too high. On two of
 the nodes the CompactionStats show 2 outstanding tasks and on a third node
 there are over 13,000 outstanding tasks. However from looking at the log
 activity it looks like compaction has finished on all nodes.

 Is this number of files expected/normal ?


An addendum to this.

None of the files are *Data.db bigger than 5MB (including on the nodes that
have finished compaction). I'm wondering if I have misunderstood Leveled
Compaction, I thought that there should be data files of 50MB and 500MB
(the dataset is 190GB)

cheers



 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Large number of files for Leveled Compaction

2013-06-16 Thread Franc Carter

On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali mainalima...@gmail.comwrote:

 With LeveledCompaction, each sstable size is fixed and is defined by
 sstable_size_in_mb in the compaction configuration of CF definition and
 default value is 5MB. In you case, you may have not defined your own value,
 that is why your each sstable is 5MB. And if you dataset is huge, you will
 see a lot of sstable counts.



Ok, seems like I do have (at least) an incomplete understanding. I realise
that the minimum size is 5MB, but I thought compaction would merge these
into a smaller number of larger sstables ?

thanks


 Cheers

 Manoj


 On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter franc.car...@sirca.org.auwrote:


 Hi,

 We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
 like it may be a win for us.

 The first step of testing was to push a fairly large slab of data into
 the Column Family - we did this much faster ( x100) than we would in a
 production environment. This has left the Column Family with about 140,000
 files in the Column Family directory which seems way too high. On two of
 the nodes the CompactionStats show 2 outstanding tasks and on a third node
 there are over 13,000 outstanding tasks. However from looking at the log
 activity it looks like compaction has finished on all nodes.

 Is this number of files expected/normal ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Large number of files for Leveled Compaction

2013-06-16 Thread Franc Carter

On Mon, Jun 17, 2013 at 2:59 PM, Manoj Mainali mainalima...@gmail.comwrote:

 Not in the case of LeveledCompaction. Only SizeTieredCompaction merges
 smaller sstables into large ones. With the LeveledCompaction, the sstables
 are always of fixed size but they are grouped into different levels.

 You can refer to this page
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra on
 details of how LeveledCompaction works.


Yes, but it seems I've misinterpreted that page ;-(

I took this paragraph

In figure 3, new sstables are added to the first level, L0, and immediately
 compacted with the sstables in L1 (blue). When L1 fills up, extra sstables
 are promoted to L2 (violet). Subsequent sstables generated in L1 will be
 compacted with the sstables in L2 with which they overlap. As more data is
 added, leveled compaction results in a situation like the one shown in
 figure 4.


to mean that once a level fills up it gets compacted into a higher level

cheers


 Cheers
 Manoj


 On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:

 On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali mainalima...@gmail.comwrote:

 With LeveledCompaction, each sstable size is fixed and is defined by
 sstable_size_in_mb in the compaction configuration of CF definition and
 default value is 5MB. In you case, you may have not defined your own value,
 that is why your each sstable is 5MB. And if you dataset is huge, you will
 see a lot of sstable counts.



 Ok, seems like I do have (at least) an incomplete understanding. I
 realise that the minimum size is 5MB, but I thought compaction would merge
 these into a smaller number of larger sstables ?

 thanks


 Cheers

 Manoj


 On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:


 Hi,

 We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
 like it may be a win for us.

 The first step of testing was to push a fairly large slab of data into
 the Column Family - we did this much faster ( x100) than we would in a
 production environment. This has left the Column Family with about 140,000
 files in the Column Family directory which seems way too high. On two of
 the nodes the CompactionStats show 2 outstanding tasks and on a third node
 there are over 13,000 outstanding tasks. However from looking at the log
 activity it looks like compaction has finished on all nodes.

 Is this number of files expected/normal ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Large number of files for Leveled Compaction

2013-06-16 Thread Franc Carter

On Mon, Jun 17, 2013 at 3:28 PM, Wei Zhu wz1...@yahoo.com wrote:

 default value of 5MB is way too small in practice. Too many files in one
 directory is not a good thing. It's not clear what should be a good number.
 I have heard people are using 50MB, 75MB, even 100MB. Do your own test o
 find a right number.


Interesting - 50MB is the low end of what people are using - 5MB is a lot
lower. I'll try a 50MB set

cheers


 -Wei

 --
 *From: *Franc Carter franc.car...@sirca.org.au
 *To: *user@cassandra.apache.org
 *Sent: *Sunday, June 16, 2013 10:15:22 PM
 *Subject: *Re: Large number of files for Leveled Compaction




 On Mon, Jun 17, 2013 at 2:59 PM, Manoj Mainali mainalima...@gmail.comwrote:

 Not in the case of LeveledCompaction. Only SizeTieredCompaction merges
 smaller sstables into large ones. With the LeveledCompaction, the sstables
 are always of fixed size but they are grouped into different levels.

 You can refer to this page
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra on
 details of how LeveledCompaction works.


 Yes, but it seems I've misinterpreted that page ;-(

 I took this paragraph

 In figure 3, new sstables are added to the first level, L0, and
 immediately compacted with the sstables in L1 (blue). When L1 fills up,
 extra sstables are promoted to L2 (violet). Subsequent sstables generated
 in L1 will be compacted with the sstables in L2 with which they overlap. As
 more data is added, leveled compaction results in a situation like the one
 shown in figure 4.


 to mean that once a level fills up it gets compacted into a higher level

 cheers


 Cheers
 Manoj


 On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:

 On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali 
 mainalima...@gmail.comwrote:

 With LeveledCompaction, each sstable size is fixed and is defined by
 sstable_size_in_mb in the compaction configuration of CF definition and
 default value is 5MB. In you case, you may have not defined your own value,
 that is why your each sstable is 5MB. And if you dataset is huge, you will
 see a lot of sstable counts.



 Ok, seems like I do have (at least) an incomplete understanding. I
 realise that the minimum size is 5MB, but I thought compaction would merge
 these into a smaller number of larger sstables ?

 thanks


 Cheers

 Manoj


 On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter franc.car...@sirca.org.au
  wrote:


 Hi,

 We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
 like it may be a win for us.

 The first step of testing was to push a fairly large slab of data into
 the Column Family - we did this much faster ( x100) than we would in a
 production environment. This has left the Column Family with about 140,000
 files in the Column Family directory which seems way too high. On two of
 the nodes the CompactionStats show 2 outstanding tasks and on a third node
 there are over 13,000 outstanding tasks. However from looking at the log
 activity it looks like compaction has finished on all nodes.

 Is this number of files expected/normal ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Compaction not running

2013-06-14 Thread Franc Carter

On Sat, Jun 15, 2013 at 8:48 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jun 12, 2013 at 3:26 PM, Franc Carter franc.car...@sirca.org.au
 wrote:
  We are running a test system with Leveled compaction on Cassandra-1.2.4.
  While doing an initial load of the data one of the nodes ran out of file
  descriptors and since then it hasn't been automatically compacting.

 You have (at least) two options :

 1) increase file descriptors available to Cassandra with ulimit, if
 possible
 2) increase the size of your sstables with levelled compaction, such
 that you have fewer of them


Oops, I wasn't clear enough.

I have increased the number of file descriptors and no longer have a file
descriptor issue. However the node still doesn't compact automatically. If
I run a 'nodetool compact' it will do a small amount of compaction and then
stop. The Column Family is using LCS

cheers



 =Rob




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Compaction not running

2013-06-12 Thread Franc Carter

Hi,

We are running a test system with Leveled compaction on Cassandra-1.2.4.
While doing an initial load of the data one of the nodes ran out of file
descriptors and since then it hasn't been automatically compacting.

Any suggestions on how to fix this ?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Cassandra (1.2.5) + Pig (0.11.1) Errors with large column families

2013-06-10 Thread Franc Carter

-- Forwarded message --
From: Mark Lewandowski mark.e.lewandow...@gmail.com
Date: Jun 8, 2013 8:03 AM
Subject: Cassandra (1.2.5) + Pig (0.11.1) Errors with large column families
To: user@cassandra.apache.org
Cc:

 I'm cur.rently trying to get Cassandra (1.2.5) and Pig (0.11.1) to play
nice together.  I'm running a basic script:

 rows = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage();
 dump rows;

 This fails for my column family which has ~100,000 rows.  However, if I
modify the script to this:

 rows = LOAD 'cassandra://betable_games/bets' USING CassandraStorage();
 rows = limit rows 7000;
 dump rows;

 Then it seems to work.  7000 is about as high as I've been able to get it
before it fails.  The error I keep getting is:

 2013-06-07 14:58:49,119 [Thread-4] WARN
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
 java.lang.RuntimeException: org.apache.thrift.TException: Message length
exceeded: 4480
 at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
 at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
 at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
 at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169)
 at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514)
 at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
 Caused by: org.apache.thrift.TException: Message length exceeded: 4480
 at
org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
 at
org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
 at org.apache.cassandra.thrift.Column.read(Column.java:535)
 at
org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507)
 at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408)
 at
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
 at
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734)
 at
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718)
 at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346)
 ... 13 more

 I've seen a similar problem on this mailing list using Cassandra-1.2.3,
however the fixes on that thread of
increasing thrift_framed_transport_size_in_mb, thrift_max_message_length_in_mb
in cassandra.yaml did not appear to have any effect.  Has anyone else seen
this issue, and how can I fix it?

 Thanks,

 -Mark

Large number of files for Leveled Compaction

2013-06-06 Thread Franc Carter

Hi,

We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks like
it may be a win for us.

The first step of testing was to push a fairly large slab of data into the
Column Family - we did this much faster ( x100) than we would in a
production environment. This has left the Column Family with about 140,000
files in the Column Family directory which seems way too high. On two of
the nodes the CompactionStats show 2 outstanding tasks and on a third node
there are over 13,000 outstanding tasks. However from looking at the log
activity it looks like compaction has finished on all nodes.

Is this number of files expected/normal ?

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 8355 2514

Level 4, 55 Harrington St, The Rocks NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: two-node cassandra cluster

2012-08-26 Thread Franc Carter

On Fri, Aug 24, 2012 at 8:25 PM, Jason Axelson ja...@engagestage.comwrote:

 Hi, I have an application that will be very dormant most of the time
 but will need high-bursting a few days out of the month. Since we are
 deploying on EC2 I would like to keep only one Cassandra server up
 most of the time and then on burst days I want to bring one more
 server up (with more RAM and CPU than the first) to help serve the
 load. What is the best way to do this? Should I take a different
 approach?

 Some notes about what I plan to do:
 * Bring the node up and repair it immediately
 * After the burst time is over decommission the powerful node
 * Use the always-on server as the seed node
 * My main question is how to get the nodes to share all the data since
 I want a replication factor of 2 (so both nodes have all the data) but
 that won't work while there is only one server. Should I bring up 2
 extra servers instead of just one?

 Thanks,
 Jason


Caveat: I haven't tried what I am about to suggest

Could you run the cluster on smaller instances for most of the time and
then when you need more performance increases the instance size to get more
CPU/Memory. If you use EBS with provisioned IOPs you should be able to make
the transition reasonably quickly.

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Ball is rolling on High Performance Cassandra Cookbook second edition

2012-06-27 Thread Franc Carter

 does not understand what a column family is they will
 likely use cassandra incorrectly.


This is my view as well. One of the big hurdles I noticed with developers
moving to Cassandra is that there is a strong tendency to apply RDBMS
thinking to Casandra - this is unsurprising, the majority of data store
conceptualisation exists in this framework. I can see using names that have
connections with RDBMS is likely to encourage this.

cheers


 Maybe this is just a semantics debate because a table in a column
 oriented database is different then a table in a row oriented
 database, but the column family data model is one of the cornerstones
 of Cassandra. Globally replacing column family with table for the text
 is not a good idea.

 We will have to be smart about it. As thrift, the cli, the internals,
 the high level clients will be like this for some time.

 I definitely plan to add an entire chapter on CQL. I think we can put
 it after the CLI chapter, the introduction of CQL can attempt to cover
 the ground between the old school and the new school thinking.

 Edward




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: enforcing ordering

2012-06-12 Thread Franc Carter

On Thu, May 10, 2012 at 9:05 PM, aaron morton aa...@thelastpickle.comwrote:

 Kewl.

 I'd be interested to know what you come up with.



Hi,

it's taken some thought, however we know have a data model that we like and
it does indeed make our major concerns non-existent ;-) So I thought I'd
explain it in case someone is doing something sufficiently close to us that
the model is useful (and in case we are doing something silly - I hope not).

The data set consist of 30 years of daily data for several million
entities, the data for each is a small number of different record types (
10) where entity,date,record_type is unique. Each record_type can have a
couple of hundred key/value pairs.

The query that we need to do is

  Set_of_Values = Get(set_of_entities, date_range, set_of_keys)

Where set_of_keys is likely to be most of the keys that are valid for the
entities.

One slight complication (the one that sparked my initial question) is that
there are also corrections that completely replace the data for an
entity,date,record_type, multiple versions of the corrections can be
transmitted, but only one correction per entity/day/record_type

The data model that we have designed has a single Column Family keyed by
the entity with a composite column name consisting of
date,version,record_type with the value being a protobuf packing of the
key/value pairs from the record. The version is the 'receipt data of the
data' - 'date the data is for'. The properties of this that we like are:-

* Record insertion is idempotent allowing for multiple active/active order
independent loaders, this is a really big win for us(1).
* The random partitioner gives us good scalability across the entity
dimension which is the largest dimension.
* The column ordering makes it easy to find the most recent 'correct' value
for an entity on a day.
* The Column ordering give us reasonably efficient date range queries

There are a couple of implications of this data model:-

* We store more data than we have to in the ideal world.
* we push the work of decoding/extracting information of the protobuf on to
the clients along with some of the version management.

My view is that this a reasonable trade-off for systems that can have large
numbers of clients that are independent of each other as scaling client
machines is not hard.

Feedback welcome

cheers

(1) It's important as it allows us to use a large number of loading
processes to insert the historical data that is pretty large in a short
period of time.



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 10/05/2012, at 3:03 PM, Franc Carter wrote:



 On Tue, May 8, 2012 at 8:21 PM, Franc Carter franc.car...@sirca.org.auwrote:

 On Tue, May 8, 2012 at 8:09 PM, aaron morton aa...@thelastpickle.comwrote:

 Can you store the corrections in a separate CF?


 We sat down and thought about this harder - it looks like a good solution
 for us that may makel other hard problems go away - thanks.

 cheers


 Yes, I thought of that, but that turns on read in to two ;-(



 When the client reads the key, reads from the original the corrects CF
 at the same time. Apply the correction only on the client side.

 When you have confirmed the ingest has completed, run a background jobs
 to apply the corrections, store the updated values and delete the
 correction data.


 I was thinking down this path, but I ended up chasing the rabbit down a
 deep hole of race conditions . . .

 cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 8/05/2012, at 9:35 PM, Franc Carter wrote:


 Hi,

 I'm wondering if there is a common 'pattern' to address a scenario we
 will have to deal with.

 We will be storing a set of Column/Value pairs per Key where the
 Column/Values are read from a set of files that we download regularly. We
 need the loading to be resilient and we can receive corrections for some of
 the Column/Values that can only be loaded after the initial data has been
 inserted.

 The challenge we have is that we have a strong preference for
 active/active loading of data and can't see how to achieve this without
 some form of serialisation (which Cassandra doesn't support - correct ?)

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW

Re: Query

2012-06-04 Thread Franc Carter

On Mon, Jun 4, 2012 at 7:36 PM, MOHD ARSHAD SALEEM 
marshadsal...@tataelxsi.co.in wrote:

  Hi all,

 I wanted to know how to read and write data using cassandra API's . is
 there any link related to sample program .


I did a Proof of Concept using a python client -PyCassa (
https://github.com/pycassa/pycassa) which works well

cheers


 Regards
 Arshad




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Number of keyspaces

2012-05-23 Thread Franc Carter

On Wed, May 23, 2012 at 8:09 PM, aaron morton aa...@thelastpickle.comwrote:

 We were thinking of doing a major compaction after each year is 'closed
 off'.

 Not a terrible idea. Years tend to happen annually, so their growth
 pattern is well understood.

 This would mean that compactions for the current year were dealing with a
 smaller amount of data and hence be faster and have less impact on a
 day-to-day basis.

 Older data is compacted into higher tiers / generations so will not be
 included when compacting new data (background
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra). That
 said, there is a chance that at some point you the big older files get
 compacted. i.e. if you get (by default) 4 X 100GB files they will get
 compacted into 1.


I'm a bit nervous about leveled compaction as it's new(ish)



 It feels a bit like a premature optimisation.


Yep, that's certainly possible - it's habit I tend towards ;-(

cheers



   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 1:52 PM, Franc Carter wrote:

 On Wed, May 23, 2012 at 7:42 AM, aaron morton aa...@thelastpickle.comwrote:

 1 KS with 24 CF's will use roughly the same resources as 24 KS's with 1
 CF. Each CF:

 * loads the bloom filter for each SSTable
 * samples the index for each sstable
 * uses row and key cache
 * has a current memtable and potentially memtables waiting to flush.
 * had secondary index CF's

 I would generally avoid a data model that calls for CF's to be added in
 response to new entities or new data. Older data will move moved to larger
 files, and not included in compaction for newer data.


 We were thinking of doing a major compaction after each year is 'closed
 off'. This would mean that compactions for the current year were dealing
 with a smaller amount of data and hence be faster and have less impact on a
 day-to-day basis. Our query patterns will only infrequently cross year
 boundaries.

 Are we being naive ?

 cheers



 Hope that helps.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 3:31 AM, Luís Ferreira wrote:

 I have 24 keyspaces, each with a columns family and am considering
 changing it to 1 keyspace with 24 CFs. Would this be beneficial?
 On May 22, 2012, at 12:56 PM, samal wrote:

 Not ideally, now cass has global memtable tuning. Each cf correspond to
 memory  in ram. Year wise cf means it will be in read only state for next
 year, memtable  will still consume ram.
 On 22-May-2012 5:01 PM, Franc Carter franc.car...@sirca.org.au wrote:

 On Tue, May 22, 2012 at 9:19 PM, aaron morton 
 aa...@thelastpickle.comwrote:

 It's more the number of CF's than keyspaces.


 Oh - does increasing the number of Column Families affect performance ?

 The design we are working on at the moment is considering using a Column
 Family per year. We were thinking this would isolate compactions to a more
 manageable size as we don't update previous years.

 cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/05/2012, at 6:58 PM, R. Verlangen wrote:

 Yes, it does. However there's no real answer what's the limit: it
 depends on your hardware and cluster configuration.

 You might even want to search the archives of this mailinglist, I
 remember this has been asked before.

 Cheers!

 2012/5/21 Luís Ferreira zamith...@gmail.com

 Hi,

 Does the number of keyspaces affect the overall cassandra performance?


 Cumprimentos,
 Luís Ferreira






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215


  Cumprimentos,
 Luís Ferreira







 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Number of keyspaces

2012-05-22 Thread Franc Carter

On Tue, May 22, 2012 at 9:19 PM, aaron morton aa...@thelastpickle.comwrote:

 It's more the number of CF's than keyspaces.


Oh - does increasing the number of Column Families affect performance ?

The design we are working on at the moment is considering using a Column
Family per year. We were thinking this would isolate compactions to a more
manageable size as we don't update previous years.

cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/05/2012, at 6:58 PM, R. Verlangen wrote:

 Yes, it does. However there's no real answer what's the limit: it depends
 on your hardware and cluster configuration.

 You might even want to search the archives of this mailinglist, I remember
 this has been asked before.

 Cheers!

 2012/5/21 Luís Ferreira zamith...@gmail.com

 Hi,

 Does the number of keyspaces affect the overall cassandra performance?


 Cumprimentos,
 Luís Ferreira






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Number of keyspaces

2012-05-22 Thread Franc Carter

On Wed, May 23, 2012 at 7:42 AM, aaron morton aa...@thelastpickle.comwrote:

 1 KS with 24 CF's will use roughly the same resources as 24 KS's with 1
 CF. Each CF:

 * loads the bloom filter for each SSTable
 * samples the index for each sstable
 * uses row and key cache
 * has a current memtable and potentially memtables waiting to flush.
 * had secondary index CF's

 I would generally avoid a data model that calls for CF's to be added in
 response to new entities or new data. Older data will move moved to larger
 files, and not included in compaction for newer data.


We were thinking of doing a major compaction after each year is 'closed
off'. This would mean that compactions for the current year were dealing
with a smaller amount of data and hence be faster and have less impact on a
day-to-day basis. Our query patterns will only infrequently cross year
boundaries.

Are we being naive ?

cheers



 Hope that helps.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 3:31 AM, Luís Ferreira wrote:

 I have 24 keyspaces, each with a columns family and am considering
 changing it to 1 keyspace with 24 CFs. Would this be beneficial?
 On May 22, 2012, at 12:56 PM, samal wrote:

 Not ideally, now cass has global memtable tuning. Each cf correspond to
 memory  in ram. Year wise cf means it will be in read only state for next
 year, memtable  will still consume ram.
 On 22-May-2012 5:01 PM, Franc Carter franc.car...@sirca.org.au wrote:

 On Tue, May 22, 2012 at 9:19 PM, aaron morton aa...@thelastpickle.comwrote:

 It's more the number of CF's than keyspaces.


 Oh - does increasing the number of Column Families affect performance ?

 The design we are working on at the moment is considering using a Column
 Family per year. We were thinking this would isolate compactions to a more
 manageable size as we don't update previous years.

 cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/05/2012, at 6:58 PM, R. Verlangen wrote:

 Yes, it does. However there's no real answer what's the limit: it
 depends on your hardware and cluster configuration.

 You might even want to search the archives of this mailinglist, I
 remember this has been asked before.

 Cheers!

 2012/5/21 Luís Ferreira zamith...@gmail.com

 Hi,

 Does the number of keyspaces affect the overall cassandra performance?


 Cumprimentos,
 Luís Ferreira






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215


  Cumprimentos,
 Luís Ferreira







-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: enforcing ordering

2012-05-10 Thread Franc Carter

On Thu, May 10, 2012 at 9:05 PM, aaron morton aa...@thelastpickle.comwrote:

 Kewl.

 I'd be interested to know what you come up with.


Sure  - I'll post details once we have them nailed down. I suspect that it
will be 'obvious in hindsight', I'm still suffering from RDBMS brain -
which is interesting becuse i am not a database guy, but yet I still have
these ingrained ways of thinking

cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 10/05/2012, at 3:03 PM, Franc Carter wrote:



 On Tue, May 8, 2012 at 8:21 PM, Franc Carter franc.car...@sirca.org.auwrote:

 On Tue, May 8, 2012 at 8:09 PM, aaron morton aa...@thelastpickle.comwrote:

 Can you store the corrections in a separate CF?


 We sat down and thought about this harder - it looks like a good solution
 for us that may makel other hard problems go away - thanks.

 cheers


 Yes, I thought of that, but that turns on read in to two ;-(



 When the client reads the key, reads from the original the corrects CF
 at the same time. Apply the correction only on the client side.

 When you have confirmed the ingest has completed, run a background jobs
 to apply the corrections, store the updated values and delete the
 correction data.


 I was thinking down this path, but I ended up chasing the rabbit down a
 deep hole of race conditions . . .

 cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 8/05/2012, at 9:35 PM, Franc Carter wrote:


 Hi,

 I'm wondering if there is a common 'pattern' to address a scenario we
 will have to deal with.

 We will be storing a set of Column/Value pairs per Key where the
 Column/Values are read from a set of files that we download regularly. We
 need the loading to be resilient and we can receive corrections for some of
 the Column/Values that can only be loaded after the initial data has been
 inserted.

 The challenge we have is that we have a strong preference for
 active/active loading of data and can't see how to achieve this without
 some form of serialisation (which Cassandra doesn't support - correct ?)

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: enforcing ordering

2012-05-09 Thread Franc Carter

On Tue, May 8, 2012 at 8:21 PM, Franc Carter franc.car...@sirca.org.auwrote:

 On Tue, May 8, 2012 at 8:09 PM, aaron morton aa...@thelastpickle.comwrote:

 Can you store the corrections in a separate CF?


We sat down and thought about this harder - it looks like a good solution
for us that may makel other hard problems go away - thanks.

cheers


 Yes, I thought of that, but that turns on read in to two ;-(



 When the client reads the key, reads from the original the corrects CF at
 the same time. Apply the correction only on the client side.

 When you have confirmed the ingest has completed, run a background jobs
 to apply the corrections, store the updated values and delete the
 correction data.


 I was thinking down this path, but I ended up chasing the rabbit down a
 deep hole of race conditions . . .

 cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 8/05/2012, at 9:35 PM, Franc Carter wrote:


 Hi,

 I'm wondering if there is a common 'pattern' to address a scenario we
 will have to deal with.

 We will be storing a set of Column/Value pairs per Key where the
 Column/Values are read from a set of files that we download regularly. We
 need the loading to be resilient and we can receive corrections for some of
 the Column/Values that can only be loaded after the initial data has been
 inserted.

 The challenge we have is that we have a strong preference for
 active/active loading of data and can't see how to achieve this without
 some form of serialisation (which Cassandra doesn't support - correct ?)

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

enforcing ordering

2012-05-08 Thread Franc Carter

Hi,

I'm wondering if there is a common 'pattern' to address a scenario we will
have to deal with.

We will be storing a set of Column/Value pairs per Key where the
Column/Values are read from a set of files that we download regularly. We
need the loading to be resilient and we can receive corrections for some of
the Column/Values that can only be loaded after the initial data has been
inserted.

The challenge we have is that we have a strong preference for active/active
loading of data and can't see how to achieve this without some form of
serialisation (which Cassandra doesn't support - correct ?)

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: enforcing ordering

2012-05-08 Thread Franc Carter

On Tue, May 8, 2012 at 8:09 PM, aaron morton aa...@thelastpickle.comwrote:

 Can you store the corrections in a separate CF?


Yes, I thought of that, but that turns on read in to two ;-(



 When the client reads the key, reads from the original the corrects CF at
 the same time. Apply the correction only on the client side.

 When you have confirmed the ingest has completed, run a background jobs to
 apply the corrections, store the updated values and delete the correction
 data.


I was thinking down this path, but I ended up chasing the rabbit down a
deep hole of race conditions . . .

cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 8/05/2012, at 9:35 PM, Franc Carter wrote:


 Hi,

 I'm wondering if there is a common 'pattern' to address a scenario we will
 have to deal with.

 We will be storing a set of Column/Value pairs per Key where the
 Column/Values are read from a set of files that we download regularly. We
 need the loading to be resilient and we can receive corrections for some of
 the Column/Values that can only be loaded after the initial data has been
 inserted.

 The challenge we have is that we have a strong preference for
 active/active loading of data and can't see how to achieve this without
 some form of serialisation (which Cassandra doesn't support - correct ?)

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: 200TB in Cassandra ?

2012-04-20 Thread Franc Carter

On Fri, Apr 20, 2012 at 6:27 AM, aaron morton aa...@thelastpickle.comwrote:

 Couple of ideas:

 * take a look at compression in 1.X
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
 * is there repetition in the binary data ? Can you save space by
 implementing content addressable storage ?


The data is already very highly space optimised. We've come to the
conclusion that Cassandra is probably not the right fit the use case this
time

cheers



 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 20/04/2012, at 12:55 AM, Dave Brosius wrote:

  I think your math is 'relatively' correct. It would seem to me you should
 focus on how you can reduce the amount of storage you are using per item,
 if at all possible, if that node count is prohibitive.

 On 04/19/2012 07:12 AM, Franc Carter wrote:


  Hi,

  One of the projects I am working on is going to need to store about
 200TB of data - generally in manageable binary chunks. However, after doing
 some rough calculations based on rules of thumb I have seen for how much
 storage should be on each node I'm worried.

200TB with RF=3 is 600TB = 600,000GB
   Which is 1000 nodes at 600GB per node

  I'm hoping I've missed something as 1000 nodes is not viable for us.

  cheers

  --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215






-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: 200TB in Cassandra ?

2012-04-20 Thread Franc Carter

On Sat, Apr 21, 2012 at 1:05 AM, Jake Luciani jak...@gmail.com wrote:

 What other solutions are you considering?  Any OLTP style access of 200TB
 of data will require substantial IO.


We currently use an in-house written database because when we first started
our system there was nothing that handled our problem economically. We
would like to use something more off the shelf to reduce maintenance and
development costs.

We've been looking at Hadoop for the computational component. However it
looks like HDFS does not map to our storage patterns well as the latency is
quite high. In addition the resilience model of the Name Node is a concern
in our environment.

We were thinking through whether using Cassandra for the Hadoop data store
is viable for us, however we've come to the conclusion that it doesn't map
well in this case.



 Do you know how big your working dataset will be?


The system is batch, jobs could range between very small up to a moderate
percentage of the data set. It' even possible that we could need to read
the entire data set. How much we get resident is a cost/performance
trade-off we need to make

cheers



 -Jake


 On Fri, Apr 20, 2012 at 3:30 AM, Franc Carter 
 franc.car...@sirca.org.auwrote:

 On Fri, Apr 20, 2012 at 6:27 AM, aaron morton aa...@thelastpickle.comwrote:

 Couple of ideas:

 * take a look at compression in 1.X
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
 * is there repetition in the binary data ? Can you save space by
 implementing content addressable storage ?


 The data is already very highly space optimised. We've come to the
 conclusion that Cassandra is probably not the right fit the use case this
 time

 cheers



 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 20/04/2012, at 12:55 AM, Dave Brosius wrote:

  I think your math is 'relatively' correct. It would seem to me you
 should focus on how you can reduce the amount of storage you are using per
 item, if at all possible, if that node count is prohibitive.

 On 04/19/2012 07:12 AM, Franc Carter wrote:


  Hi,

  One of the projects I am working on is going to need to store about
 200TB of data - generally in manageable binary chunks. However, after doing
 some rough calculations based on rules of thumb I have seen for how much
 storage should be on each node I'm worried.

200TB with RF=3 is 600TB = 600,000GB
   Which is 1000 nodes at 600GB per node

  I'm hoping I've missed something as 1000 nodes is not viable for us.

  cheers

  --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --
 http://twitter.com/tjake




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

200TB in Cassandra ?

2012-04-19 Thread Franc Carter

Hi,

One of the projects I am working on is going to need to store about 200TB
of data - generally in manageable binary chunks. However, after doing some
rough calculations based on rules of thumb I have seen for how much storage
should be on each node I'm worried.

  200TB with RF=3 is 600TB = 600,000GB
  Which is 1000 nodes at 600GB per node

I'm hoping I've missed something as 1000 nodes is not viable for us.

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: RE 200TB in Cassandra ?

2012-04-19 Thread Franc Carter

On Thu, Apr 19, 2012 at 9:38 PM, Romain HARDOUIN
romain.hardo...@urssaf.frwrote:


 Cassandra supports data compression and depending on your data, you can
 gain a reduction in data size up to 4x.


The data is gzip'd already ;-)


 600 TB is a lot, hence requires lots of servers...


 Franc Carter franc.car...@sirca.org.au a écrit sur 19/04/2012 13:12:19 :

  Hi,
 
  One of the projects I am working on is going to need to store about
  200TB of data - generally in manageable binary chunks. However,
  after doing some rough calculations based on rules of thumb I have
  seen for how much storage should be on each node I'm worried.
 
200TB with RF=3 is 600TB = 600,000GB
Which is 1000 nodes at 600GB per node
 
  I'm hoping I've missed something as 1000 nodes is not viable for us.
 
  cheers
 
  --
  Franc Carter | Systems architect | Sirca Ltd
  franc.car...@sirca.org.au | www.sirca.org.au
  Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
  PO Box H58, Australia Square, Sydney NSW 1215




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: 200TB in Cassandra ?

2012-04-19 Thread Franc Carter

On Thu, Apr 19, 2012 at 10:07 PM, John Doe jd...@yahoo.com wrote:

 Franc Carter franc.car...@sirca.org.au

  One of the projects I am working on is going to need to store about
 200TB of data - generally in manageable binary chunks. However, after doing
 some rough calculations based on rules of thumb I have seen for how much
 storage should be on each node I'm worried.
   200TB with RF=3 is 600TB = 600,000GB
   Which is 1000 nodes at 600GB per node
  I'm hoping I've missed something as 1000 nodes is not viable for us.

 Why only 600GB per node?


I had seen comments that you didn't want to put 'too much' data on to a
single node and had seen the figure of 400GB thrown around as an
approximate figure - I rounded up to 600GB to make the maths easy ;-)

I'm hoping that my understanding is flawed ;-)

cheers



 JD




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: RE 200TB in Cassandra ?

2012-04-19 Thread Franc Carter

On Thu, Apr 19, 2012 at 10:16 PM, Yiming Sun yiming@gmail.com wrote:

 600 TB is really a lot, even 200 TB is a lot.  In our organization,
 storage at such scale is handled by our storage team and they purchase
 specialized (and very expensive) equipment from storage hardware vendors
 because at this scale, performance and reliability is absolutely critical.


Yep that's what we currently do. We have 200TB sitting on a set of high end
disk arrays which are running RAID6. I'm in the early stages of looking at
whether this is still the best approach.



 but it sounds like your team may not be able to afford such equipment.
  600GB per node will require a cloud and you need a data center to house
 them... but 2TB disks are common place nowadays and you can jam multiple
 2TB disks into each node to reduce the number of machines needed.  It all
 depends on what budget you have.


The bit I am trying to understand is whether my figure of 400TB/node in
practice for Cassandra is correct, or whether we can push the GB/node
higher and if so how high

cheers


 -- Y.


 On Thu, Apr 19, 2012 at 7:54 AM, Franc Carter 
 franc.car...@sirca.org.auwrote:

 On Thu, Apr 19, 2012 at 9:38 PM, Romain HARDOUIN 
 romain.hardo...@urssaf.fr wrote:


 Cassandra supports data compression and depending on your data, you can
 gain a reduction in data size up to 4x.


 The data is gzip'd already ;-)


 600 TB is a lot, hence requires lots of servers...


 Franc Carter franc.car...@sirca.org.au a écrit sur 19/04/2012
 13:12:19 :

  Hi,
 
  One of the projects I am working on is going to need to store about
  200TB of data - generally in manageable binary chunks. However,
  after doing some rough calculations based on rules of thumb I have
  seen for how much storage should be on each node I'm worried.
 
200TB with RF=3 is 600TB = 600,000GB
Which is 1000 nodes at 600GB per node
 
  I'm hoping I've missed something as 1000 nodes is not viable for us.
 
  cheers
 
  --
  Franc Carter | Systems architect | Sirca Ltd
  franc.car...@sirca.org.au | www.sirca.org.au
  Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
  PO Box H58, Australia Square, Sydney NSW 1215




 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Largest 'sensible' value

2012-04-03 Thread Franc Carter

On Wed, Apr 4, 2012 at 8:56 AM, Jonathan Ellis jbel...@gmail.com wrote:

 We use 2MB chunks for our CFS implementation of HDFS:
 http://www.datastax.com/dev/blog/cassandra-file-system-design


thanks



 On Mon, Apr 2, 2012 at 4:23 AM, Franc Carter franc.car...@sirca.org.au
 wrote:
 
  Hi,
 
  We are in the early stages of thinking about a project that needs to
 store
  data that will be accessed by Hadoop. One of the concerns we have is
 around
  the Latency of HDFS as our use case is is not for reading all the data
 and
  hence we will need custom RecordReaders etc.
 
  I've seen a couple of comments that you shouldn't put large chunks in to
 a
  value - however 'large' is not well defined for the range of people using
  these solutions ;-)
 
  Doe anyone have a rough rule of thumb for how big a single value can be
  before we are outside sanity?
 
  thanks
 
  --
 
  Franc Carter | Systems architect | Sirca Ltd
 
  franc.car...@sirca.org.au | www.sirca.org.au
 
  Tel: +61 2 9236 9118
 
  Level 9, 80 Clarence St, Sydney NSW 2000
 
  PO Box H58, Australia Square, Sydney NSW 1215
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Largest 'sensible' value

2012-04-02 Thread Franc Carter

Hi,

We are in the early stages of thinking about a project that needs to store
data that will be accessed by Hadoop. One of the concerns we have is around
the Latency of HDFS as our use case is is not for reading all the data and
hence we will need custom RecordReaders etc.

I've seen a couple of comments that you shouldn't put large chunks in to a
value - however 'large' is not well defined for the range of people using
these solutions ;-)

Doe anyone have a rough rule of thumb for how big a single value can be
before we are outside sanity?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Largest 'sensible' value

2012-04-02 Thread Franc Carter

On Tue, Apr 3, 2012 at 4:18 AM, Ben Coverston ben.covers...@datastax.comwrote:

 This is a difficult question to answer for a variety of reasons, but I'll
 give it a try, maybe it will be helpful, maybe not.

 The most obvious problem with this is that Thrift is buffer based, not
 streaming. That means that whatever the size of your chunk it needs to
 be received, deserialized, and processed by cassandra within a timeframe
 that we call the rpc_timeout (by default this is 10 seconds).


Thanks.

 I suspect that 'not streaming' is the key, and not just from the Cassandra
side - our use case has a subtle assumption of streaming on the client
side. We could chop it up in to buckets and put each one in a time ordered
column, but that the defeats the purpose of why I was considering Cassandra
- to avoid the latency of seeks in HDFS

cheers



 Bigger buffers mean larger allocations, larger allocations mean that the
 JVM is working harder, and  is more prone to fragmentation on the heap.

 With mixed workloads (lots of high latency, large requests and many very
 small low latency requests) larger buffers can also, over time, clog up the
 thread pool in a way that can cause your shorter queries to have to wait
 for your longer running queries to complete (to free up worker threads)
 making everything slow. This isn't a problem unique to Cassandra,
 everything that uses worker queues runs into some variant of this problem.

 As with everything else, you'll probably need to test your specific use
 case to see what 'too big' is for you.

 On Mon, Apr 2, 2012 at 9:23 AM, Franc Carter franc.car...@sirca.org.auwrote:


 Hi,

 We are in the early stages of thinking about a project that needs to
 store data that will be accessed by Hadoop. One of the concerns we have is
 around the Latency of HDFS as our use case is is not for reading all the
 data and hence we will need custom RecordReaders etc.

 I've seen a couple of comments that you shouldn't put large chunks in to
 a value - however 'large' is not well defined for the range of people using
 these solutions ;-)

 Doe anyone have a rough rule of thumb for how big a single value can be
 before we are outside sanity?

 thanks

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --
 Ben Coverston
 DataStax -- The Apache Cassandra Company




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: sstable image/pic ?

2012-02-28 Thread Franc Carter

2012/2/28 Hontvári József Levente hontv...@flyordie.com



 * Does the column name get stored for every col/val for every key (which
 sort of worries me for long column names)


 Yes, the column name is stored with each value for every key, but it may
 not matter if you switch on compression, which AFAIK has only advantages
 and will be the default.  I am also worried about the storage space, so I
 did a test.


Yes - I'm using compression - I've seen the same outcome in one of our own
systems.



 There is a MySQL table which I intend to move to Cassandra. It has about
 40 columns with very long column names, the average is 15 characters. The
 column values are mostly 2-4 byte integers. On the other hand many colums
 are empty, specifically not NULL but 0. AFAIK MySQL is also able to
 optimize NON NULL columns with 0 values to a single bit. In Cassandra I
 simply did not store a column if its value is the default 0. The table
 size, only data without indexes, in MySQL was  about 2.5 GB with 7 millions
 rows. In Cassandra it was about 12 GB without compression, and 3,4 GB with
 compression (which also includes a single index for the row keys).

 So with compression switched on, in this specific case the storage
 requirements are roughly the same on Cassandra and MySQL.


Good to know - thanks







 * Is data in an sstable sorted by key then column or column then key


 Sorted by key and then sorted by column.



thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

sstable image/pic ?

2012-02-27 Thread Franc Carter

Hi,

does anyone know of a picture/image that shows the layout of
keys/columns/values in an sstable - I haven't been able to find one and am
having a hard time visualising the layout from various descriptions and
various overviews

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

data model advice

2012-02-23 Thread Franc Carter

Hi,

I've finished my first model and experiments with Cassandra with result I'm
pretty happy with - so I thought I'd move on to something harder.

We have a set of data that has a large number of entities (which is our
primary search key), for each of the entities we have a smallish (100)
number of sets of data. Each set has a further set the contains column/vale
pairs.

The queries will be for an Entity, for one or more days for one or more of
the subsets. Conceptually I would like to model like it like this:-

Entity {
   Day1: {
   TypeA: {col1:val1, col2:val2, . . . }
   TypeB: {col1:val1, col3:val3, . . . }
  .
  .
   }
   .
   .
   .
   DayN: {
   TypeB: {col3:val3, col5:val5, . . . }
   TypeD: {col3:val3, col6:val6, . . . }
  .
  .
   }
}

My understanding of the Cassandra data model is that I run out of map-dept
to do this in my simplistic approach as the Days are super columns, the
types are column and then I don't have a col/val map left for data.

Does anyone have advice on a good approach ?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: data model advice

2012-02-23 Thread Franc Carter

On Fri, Feb 24, 2012 at 2:54 PM, Martin Arrowsmith 
arrowsmith.mar...@gmail.com wrote:

 Hi Franc,

 Or, you can consider using composite columns. It is not recommended to use
 Super Columns anymore.


Thanks,

I'll look in to composite columns

cheers



 Best wishes,

 Martin


 On Thu, Feb 23, 2012 at 7:51 PM, Indranath Ghosh indrana...@gmail.comwrote:

 How about using a composite row key like the following:

 Entity.Day1.TypeA: {col1:val1, col2:val2, . . . }
 Entity.Day1.TypeB: {col1:val1, col2:val2, . . . }
 .
 .
 Entity.DayN.TypeA: {col1:val1, col2:val2, . . . }
 Entity.DayN.TypeB: {col1:val1, col2:val2, . . . }

 It is better to avoid super columns..

 -indra

 On Thu, Feb 23, 2012 at 6:36 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:


 Hi,

 I've finished my first model and experiments with Cassandra with result
 I'm pretty happy with - so I thought I'd move on to something harder.

 We have a set of data that has a large number of entities (which is our
 primary search key), for each of the entities we have a smallish (100)
 number of sets of data. Each set has a further set the contains column/vale
 pairs.

 The queries will be for an Entity, for one or more days for one or more
 of the subsets. Conceptually I would like to model like it like this:-

 Entity {
Day1: {
TypeA: {col1:val1, col2:val2, . . . }
TypeB: {col1:val1, col3:val3, . . . }
   .
   .
}
.
.
.
DayN: {
TypeB: {col3:val3, col5:val5, . . . }
TypeD: {col3:val3, col6:val6, . . . }
   .
   .
}
 }

 My understanding of the Cassandra data model is that I run out of
 map-dept to do this in my simplistic approach as the Days are super
 columns, the types are column and then I don't have a col/val map left for
 data.

 Does anyone have advice on a good approach ?

 thanks

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Indranath Ghosh
 Phone: 408-813-9207*





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: data model advice

2012-02-23 Thread Franc Carter

On Fri, Feb 24, 2012 at 2:54 PM, Martin Arrowsmith 
arrowsmith.mar...@gmail.com wrote:

 Hi Franc,

 Or, you can consider using composite columns. It is not recommended to use
 Super Columns anymore.

 Best wishes,


On first read it would seem that there is fair bit of overhead with
composite columns as it's my understanding that the column name is stored
with each value - or have I missed something ?

cheers



 Martin


 On Thu, Feb 23, 2012 at 7:51 PM, Indranath Ghosh indrana...@gmail.comwrote:

 How about using a composite row key like the following:

 Entity.Day1.TypeA: {col1:val1, col2:val2, . . . }
 Entity.Day1.TypeB: {col1:val1, col2:val2, . . . }
 .
 .
 Entity.DayN.TypeA: {col1:val1, col2:val2, . . . }
 Entity.DayN.TypeB: {col1:val1, col2:val2, . . . }

 It is better to avoid super columns..

 -indra

 On Thu, Feb 23, 2012 at 6:36 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:


 Hi,

 I've finished my first model and experiments with Cassandra with result
 I'm pretty happy with - so I thought I'd move on to something harder.

 We have a set of data that has a large number of entities (which is our
 primary search key), for each of the entities we have a smallish (100)
 number of sets of data. Each set has a further set the contains column/vale
 pairs.

 The queries will be for an Entity, for one or more days for one or more
 of the subsets. Conceptually I would like to model like it like this:-

 Entity {
Day1: {
TypeA: {col1:val1, col2:val2, . . . }
TypeB: {col1:val1, col3:val3, . . . }
   .
   .
}
.
.
.
DayN: {
TypeB: {col3:val3, col5:val5, . . . }
TypeD: {col3:val3, col6:val6, . . . }
   .
   .
}
 }

 My understanding of the Cassandra data model is that I run out of
 map-dept to do this in my simplistic approach as the Days are super
 columns, the types are column and then I don't have a col/val map left for
 data.

 Does anyone have advice on a good approach ?

 thanks

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Indranath Ghosh
 Phone: 408-813-9207*





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: List all keys with RandomPartitioner

2012-02-22 Thread Franc Carter

On Wed, Feb 22, 2012 at 8:47 PM, Flavio Baronti f.baro...@list-group.comwrote:

 I need to iterate over all the rows in a column family stored with
 RandomPartitioner.
 When I reach the end of a key slice, I need to find the token of the last
 key in order to ask for the next slice.
 I saw in an old email that the token for a specific key can be recoveder
 through FBUtilities.hash(). That class however is inside the full Cassandra
 jar, not inside the client-specific part.
 Is there a way to iterate over all the keys which does not require the
 server-side Cassandra jar?


Does this help ?

 http://wiki.apache.org/cassandra/FAQ#iter_world

cheers


 Thanks
 Flavio




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: reads/s suddenly dropped

2012-02-22 Thread Franc Carter

On Mon, Feb 20, 2012 at 9:42 PM, Franc Carter franc.car...@sirca.org.auwrote:

 On Mon, Feb 20, 2012 at 12:00 PM, aaron morton aa...@thelastpickle.comwrote:

 Aside from iostats..

 nodetool cfstats will give you read and write latency for each CF. This
 is the latency for the operation on each node. Check that to see if latency
 is increasing.

 Take a look at nodetool compactionstats to see if compactions are running
 at the same time. The IO is throttled but if you are on aws it may not be
 throttled enough.


 compaction had finished


 The sweet spot for non netflix deployments seems to be a m1.xlarge with
 16GB. THe JVM can have 8 and the rest can be used for memmapping files.
 Here is a good post about choosing EC2 sizes…
 http://perfcap.blogspot.co.nz/2011/03/understanding-and-using-amazon-ebs.html


 Thanks - good article. I'll go up to m1.xlarge and explore that behaviour


the m1.xlarge is giving much better and more consistent results

thanks



 cheers




 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 20/02/2012, at 9:31 AM, Franc Carter wrote:

 On Mon, Feb 20, 2012 at 4:10 AM, Philippe watche...@gmail.com wrote:

 Perhaps your dataset can no longer be held in memory. Check iostats


 I have been flushing the keycache and dropping the linux disk caches
 before each to avoid testing memory reads.

 One possibility that I thought of is that the success keys are now 'far
 enough away' that they are not being included in the previous read and
 hence the seek penalty has to be paid a lot more often  - viable ?

 cheers


 Le 19 févr. 2012 11:24, Franc Carter franc.car...@sirca.org.au a
 écrit :


 I've been testing Cassandra - primarily looking at reads/second for our
 fairly data model - one unique key with a row of columns that we always
 request. I've now setup the cluster with with m1.large (2 cpus 8GB)

 I had loaded a months worth of data in and was doing random requests as
 a torture test - and getting very nice results. I then loaded another days
 worth of day and repeated the tests while the load was running - still 
 good.

 I then started loading more days and at some point the performance
 dropped by close to an order of magnitude ;-(

 Any ideas on what to look for ?

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: reads/s suddenly dropped

2012-02-20 Thread Franc Carter

On Mon, Feb 20, 2012 at 12:00 PM, aaron morton aa...@thelastpickle.comwrote:

 Aside from iostats..

 nodetool cfstats will give you read and write latency for each CF. This is
 the latency for the operation on each node. Check that to see if latency is
 increasing.

 Take a look at nodetool compactionstats to see if compactions are running
 at the same time. The IO is throttled but if you are on aws it may not be
 throttled enough.


compaction had finished


 The sweet spot for non netflix deployments seems to be a m1.xlarge with
 16GB. THe JVM can have 8 and the rest can be used for memmapping files.
 Here is a good post about choosing EC2 sizes…
 http://perfcap.blogspot.co.nz/2011/03/understanding-and-using-amazon-ebs.html


Thanks - good article. I'll go up to m1.xlarge and explore that behaviour

cheers




 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 20/02/2012, at 9:31 AM, Franc Carter wrote:

 On Mon, Feb 20, 2012 at 4:10 AM, Philippe watche...@gmail.com wrote:

 Perhaps your dataset can no longer be held in memory. Check iostats


 I have been flushing the keycache and dropping the linux disk caches
 before each to avoid testing memory reads.

 One possibility that I thought of is that the success keys are now 'far
 enough away' that they are not being included in the previous read and
 hence the seek penalty has to be paid a lot more often  - viable ?

 cheers


 Le 19 févr. 2012 11:24, Franc Carter franc.car...@sirca.org.au a
 écrit :


 I've been testing Cassandra - primarily looking at reads/second for our
 fairly data model - one unique key with a row of columns that we always
 request. I've now setup the cluster with with m1.large (2 cpus 8GB)

 I had loaded a months worth of data in and was doing random requests as
 a torture test - and getting very nice results. I then loaded another days
 worth of day and repeated the tests while the load was running - still good.

 I then started loading more days and at some point the performance
 dropped by close to an order of magnitude ;-(

 Any ideas on what to look for ?

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215




 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

reads/s suddenly dropped

2012-02-19 Thread Franc Carter

I've been testing Cassandra - primarily looking at reads/second for our
fairly data model - one unique key with a row of columns that we always
request. I've now setup the cluster with with m1.large (2 cpus 8GB)

I had loaded a months worth of data in and was doing random requests as a
torture test - and getting very nice results. I then loaded another days
worth of day and repeated the tests while the load was running - still good.

I then started loading more days and at some point the performance dropped
by close to an order of magnitude ;-(

Any ideas on what to look for ?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: reads/s suddenly dropped

2012-02-19 Thread Franc Carter

On Mon, Feb 20, 2012 at 4:10 AM, Philippe watche...@gmail.com wrote:

 Perhaps your dataset can no longer be held in memory. Check iostats


I have been flushing the keycache and dropping the linux disk caches before
each to avoid testing memory reads.

One possibility that I thought of is that the success keys are now 'far
enough away' that they are not being included in the previous read and
hence the seek penalty has to be paid a lot more often  - viable ?

cheers

  Le 19 févr. 2012 11:24, Franc Carter franc.car...@sirca.org.au a
 écrit :


 I've been testing Cassandra - primarily looking at reads/second for our
 fairly data model - one unique key with a row of columns that we always
 request. I've now setup the cluster with with m1.large (2 cpus 8GB)

 I had loaded a months worth of data in and was doing random requests as a
 torture test - and getting very nice results. I then loaded another days
 worth of day and repeated the tests while the load was running - still good.

 I then started loading more days and at some point the performance
 dropped by close to an order of magnitude ;-(

 Any ideas on what to look for ?

 thanks

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: Key cache hit rate issue

2012-02-16 Thread Franc Carter

On 17/02/2012 8:53 AM, Eran Chinthaka Withana eran.chinth...@gmail.com
wrote:

 Hi Jonathan,

 Thanks for the reply. Yes there is a possibility that the keys can be
distributed in multiple SSTables, but my data access patterns are such that
I always read/write the whole row. So I expect all the data to be in the
same SSTable (please correct me if I'm wrong).

 For some reason 16637958 (the keys cached) has become a golden number and
I don't see key cache increasing beyond that. I also checked memory and I
have about 4GB left in JVM memory and didn't see any issues on logs.

I have seen the same thing with the keycache size becoming static

cheers


 Thanks,
 Eran Chinthaka Withana



 On Thu, Feb 16, 2012 at 1:20 PM, Jonathan Ellis jbel...@gmail.com wrote:

 So, you have roughly 1/6 of your (physical) row keys cached and about
 1/4 cache hit rate, which doesn't sound unreasonable to me.  Remember,
 each logical key may be spread across multiple physical sstables --
 each (key, sstable) pair is one entry in the key cache.

 On Thu, Feb 16, 2012 at 1:48 PM, Eran Chinthaka Withana
 eran.chinth...@gmail.com wrote:
  Hi Aaron,
 
  Here it is.
 
  Keyspace: 
  Read Count: 1123637972
  Read Latency: 5.757938114343114 ms.
  Write Count: 128201833
  Write Latency: 0.0682576607387509 ms.
  Pending Tasks: 0
  Column Family: YY
  SSTable count: 18
  Space used (live): 103318720685
  Space used (total): 103318720685
  Number of Keys (estimate): 92404992
  Memtable Columns Count: 1425580
  Memtable Data Size: 359655747
  Memtable Switch Count: 2522
  Read Count: 1123637972
  Read Latency: 14.731 ms.
  Write Count: 128201833
  Write Latency: NaN ms.
  Pending Tasks: 0
  Bloom Filter False Postives: 1488
  Bloom Filter False Ratio: 0.0
  Bloom Filter Space Used: 331522920
  Key cache capacity: 16637958
  Key cache size: 16637958
  Key cache hit rate: 0.2708
  Row cache: disabled
  Compacted row minimum size: 51
  Compacted row maximum size: 6866
  Compacted row mean size: 2560
 
  Thanks,
  Eran Chinthaka Withana
 
 
 
  On Thu, Feb 16, 2012 at 12:30 AM, aaron morton aa...@thelastpickle.com

  wrote:
 
  Its in the order of 261 to 8000 and the ratio is 0.00. But i guess
8000 is
  bit high. Is there a way to fix/improve it?
 
  Sorry I don't understand what you mean. But if the ratio is 0.0 all is
  good.
 
  Could you include the full output from cfstats for the CF you are
looking
  at ?
 
  Cheers
 
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 15/02/2012, at 1:00 PM, Eran Chinthaka Withana wrote:
 
  Its in the order of 261 to 8000 and the ratio is 0.00. But i guess
8000 is
  bit high. Is there a way to fix/improve it?
 
  Thanks,
  Eran Chinthaka Withana
 
 
  On Tue, Feb 14, 2012 at 3:42 PM, aaron morton aa...@thelastpickle.com

  wrote:
 
  Out of interest what does cfstats say about the bloom filter stats ?
A
  high false positive could lead to a low key cache hit rate.
 
  Also, is there a way to warm start the key cache, meaning pre-load
the
  amount of keys I set as keys_cached?
 
  See key_cache_save_period when creating the CF.
 
  Cheers
 
 
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
 
  On 15/02/2012, at 5:54 AM, Eran Chinthaka Withana wrote:
 
  Hi,
 
  I'm using Cassandra 1.0.7 and I've set the keys_cached to about 80%
  (using the numerical values). This is visible in cfstats too. But I'm
  getting less than 20% (or sometimes even 0%) key cache hit rate.
Well, the
  data access pattern is not the issue here as I know they are
retrieving the
  same row multiple times. I'm using hector client with dynamic load
balancing
  policy with consistency ONE for both reads and writes. Any ideas on
how to
  find the issue and fix this?
 
  Here is what I see on cfstats.
 
  Key cache capacity: 16637958
  Key cache size: 16637958
  Key cache hit rate: 0.045454545454545456
 
  Also, is there a way to warm start the key cache, meaning pre-load
the
  amount of keys I set as keys_cached?
 
  Thanks,
  Eran
 
 
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com

Re: stalled bootstrap

2012-02-15 Thread Franc Carter

On Wed, Feb 15, 2012 at 10:21 AM, aaron morton aa...@thelastpickle.comwrote:

 The assertion looks like a bug.

 Can you run it with DEBUG logging ?


Sorry - I had to blow the instances away. I'm on a tight timeline for the
Proof of Concept I am doing and rebuilding a 4-node cluster from scratch
was going to be way faster. If I get time I'll try to reproduce it towards
the end of the project - sorry.



 Do you have compression enabled ?


Yes - SnappyCompressor



 Can you please submit a ticket here
 https://issues.apache.org/jira/browse/CASSANDRA with the extra info and
 update the email thread.


Would you still like this even though I can't get much detail ?



 I *think* that the node this is happening on is failing to create the temp
 file in IncomingStreamReader.streamIn and the it's then trying to delete
 the file before it retries. Extra debugging may be a help. The assertion is
 hiding the original error.

 Can you check if the new node code create files in the data directory ?


I'll try these if I can get time to retest - thanks for the pointers

cheers


 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/02/2012, at 12:42 AM, Franc Carter wrote:


 Hi,

 I'm running the DataSatx 1.0.7 AMI in ec2. I started with two nodes and
 have just added a third node on the way to expanding to a four node cluster.

 The bootstrapping was going along ok for a while, but has stalled. In
 /var/log/cassandra/system.log I am seeing this repeated continuously (tmp
 file changes each time)

 INFO [Thread-529373] 2012-02-14 11:36:18,350 StreamInSession.java (line
 120) Streaming of file
 /raid0/cassandra/data/OpsCenter/rollups7200-hc-1-Data.db sections=2
 progress=0/42387 - 0% from
 org.apache.cassandra.streaming.StreamInSession@6ebcf58a failed:
 requesting a retry.
 ERROR [Thread-529373] 2012-02-14 11:36:18,351 AbstractCassandraDaemon.java
 (line 139) Fatal exception in thread Thread[Thread-529373,5,main]
 java.lang.AssertionError: attempted to delete non-existing file
 rollups7200-tmp-hc-529319-Data.db
 at
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at
 org.apache.cassandra.streaming.IncomingStreamReader.retry(IncomingStreamReader.java:172)
 at
 org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:92)
 at
 org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
 at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)


 Any advice on how to resolve this ?

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

stalled bootstrap

2012-02-14 Thread Franc Carter

Hi,

I'm running the DataSatx 1.0.7 AMI in ec2. I started with two nodes and
have just added a third node on the way to expanding to a four node cluster.

The bootstrapping was going along ok for a while, but has stalled. In
/var/log/cassandra/system.log I am seeing this repeated continuously (tmp
file changes each time)

INFO [Thread-529373] 2012-02-14 11:36:18,350 StreamInSession.java (line
120) Streaming of file
/raid0/cassandra/data/OpsCenter/rollups7200-hc-1-Data.db sections=2
progress=0/42387 - 0% from
org.apache.cassandra.streaming.StreamInSession@6ebcf58a failed: requesting
a retry.
ERROR [Thread-529373] 2012-02-14 11:36:18,351 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[Thread-529373,5,main]
java.lang.AssertionError: attempted to delete non-existing file
rollups7200-tmp-hc-529319-Data.db
at
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
at
org.apache.cassandra.streaming.IncomingStreamReader.retry(IncomingStreamReader.java:172)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:92)
at
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)


Any advice on how to resolve this ?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: active/pending queue lengths

2012-02-14 Thread Franc Carter

On Tue, Feb 14, 2012 at 8:01 PM, aaron morton aa...@thelastpickle.comwrote:

 And the output from tpstats is ?



I can't reproduce it at the moment ;-(

nodetool is throwing 'Failed to retrieve RMIServer stub:'  - which I'm
guessing/hoping is related to the stalled bootstrap.



 A

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 14/02/2012, at 12:43 PM, Franc Carter wrote:

 On Tue, Feb 14, 2012 at 6:06 AM, aaron morton aa...@thelastpickle.comwrote:

 What CL are you reading at ?


 Quorum



 Write ops go to RF number of nodes, read ops go to RF number of nodes 10%
 (the default probability that Read Repair will be running) of the time and
 CL number of nodes 90% of the time. With 2 nodes and RF 2 the QUOURM is 2,
 every request will involve all nodes.


 Yep, the thing tat confuses is the different behaviour for reading from
 one node versus two



 As to why the pending list gets longer, do you have some more info ? What
 process are you using to measure ? It's hard to guess why. In this setup
 every node will have the data and should be able to do a local read and
 then on the other node.


 I have four pycassa clients, two making requests to one server and two
 making requests to the other (or all four making requests to the same
 server). The requested keys don't overlap and I would expect/assume the
 keys are in the keycache

 I am looking at the output of nodetool -h tpstats

 cheers


 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 14/02/2012, at 12:47 AM, Franc Carter wrote:


 Hi,

 I've been looking at tpstats as various test queries run and I noticed
 something I don't understand.

 I have a two node cluster with RF=2 on which I run 4 parallel queries,
 each job goes through a list of keys doing a multiget for 2 keys at a time.
 If two of the queries go to one node and the other two go to a different
 node then the pending queue on the node gets much longer than if they all
 go to the one node.

 I'm clearly missing something here as I would have expected the opposite

 cheers

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

nodetool removetoken

2012-02-14 Thread Franc Carter

I teminated (ec2 destruction) a node that I was wedged during bootstrap.
However when I try to removetoken I get 'Token not found'.

It looks a bit like this issue ?

https://issues.apache.org/jira/browse/CASSANDRA-3737

nodetool -h 127.0.0.1 ring gives this

Address DC  RackStatus State   Load
OwnsToken

85070591730234615865843651857942052864
10.253.65.203   us-east 1a  Up Normal  11.18 GB
50.00%  0
10.252.82.64us-east 1a  Down   Joining 320.45 KB
25.00%  42535295865117307932921825928971026432
10.253.86.224   us-east 1a  Up Normal  11.01 GB
25.00%  85070591730234615865843651857942052864

and

nodetool -h 127.0.0.1 removetoken 42535295865117307932921825928971026432

gives

xception in thread main java.lang.UnsupportedOperationException: Token
not found.
at
org.apache.cassandra.service.StorageService.removeToken(StorageService.java:2369)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
at
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
at
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
at sun.reflect.GeneratedMethodAccessor165.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
at sun.rmi.transport.Transport$1.run(Transport.java:159)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
at
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

Any ideas on how to deal with this ?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: nodetool removetoken

2012-02-14 Thread Franc Carter

On Wed, Feb 15, 2012 at 8:49 AM, Brandon Williams dri...@gmail.com wrote:

 Before 1.0.8, use https://issues.apache.org/jira/browse/CASSANDRA-3337
 to remove it.


I'm missing something ;-( I don't see a solution in this link . .

cheers




 On Tue, Feb 14, 2012 at 3:44 PM, Franc Carter franc.car...@sirca.org.au
 wrote:
 
  I teminated (ec2 destruction) a node that I was wedged during bootstrap.
  However when I try to removetoken I get 'Token not found'.
 
  It looks a bit like this issue ?
 
  https://issues.apache.org/jira/browse/CASSANDRA-3737
 
  nodetool -h 127.0.0.1 ring gives this
 
  Address DC  RackStatus State   Load
  OwnsToken
 
  85070591730234615865843651857942052864
  10.253.65.203   us-east 1a  Up Normal  11.18 GB
  50.00%  0
  10.252.82.64us-east 1a  Down   Joining 320.45 KB
  25.00%  42535295865117307932921825928971026432
  10.253.86.224   us-east 1a  Up Normal  11.01 GB
  25.00%  85070591730234615865843651857942052864
 
  and
 
  nodetool -h 127.0.0.1 removetoken 42535295865117307932921825928971026432
 
  gives
 
  xception in thread main java.lang.UnsupportedOperationException: Token
 not
  found.
  at
 
 org.apache.cassandra.service.StorageService.removeToken(StorageService.java:2369)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93)
  at
 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:27)
  at
 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
  at
  com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120)
  at
  com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262)
  at
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836)
  at
  com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761)
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427)
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72)
  at
 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265)
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360)
  at
 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788)
  at sun.reflect.GeneratedMethodAccessor165.invoke(Unknown Source)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at
  sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305)
  at sun.rmi.transport.Transport$1.run(Transport.java:159)
  at java.security.AccessController.doPrivileged(Native Method)
  at sun.rmi.transport.Transport.serviceCall(Transport.java:155)
  at
  sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535)
  at
 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790)
  at
 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
 
  Any ideas on how to deal with this ?
 
  thanks
 
  --
 
  Franc Carter | Systems architect | Sirca Ltd
 
  franc.car...@sirca.org.au | www.sirca.org.au
 
  Tel: +61 2 9236 9118
 
  Level 9, 80 Clarence St, Sydney NSW 2000
 
  PO Box H58, Australia Square, Sydney NSW 1215
 
 




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: nodetool removetoken

2012-02-14 Thread Franc Carter

On Wed, Feb 15, 2012 at 9:25 AM, Rob Coli rc...@palominodb.com wrote:

 On Tue, Feb 14, 2012 at 2:02 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:

 On Wed, Feb 15, 2012 at 8:49 AM, Brandon Williams dri...@gmail.comwrote:

 Before 1.0.8, use https://issues.apache.org/jira/browse/CASSANDRA-3337
 to remove it.


 I'm missing something ;-( I don't see a solution in this link . .


 The solution is a patch :

 https://issues.apache.org/jira/secure/attachment/12500248/3337.txt

 If you apply this patch to your cassandra server, it will generate a JMX
 endpoint which will allow you to kill the token.


Ahh - thanks

cheers



 =Rob





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

2012/2/13 R. Verlangen ro...@us2.nl

 This is because of the warm up of Cassandra as it starts. On a start it
 will start fetching the rows that were cached: this will have to be loaded
 from the disk, as there is nothing in the cache yet. You can read more
 about this at  http://wiki.apache.org/cassandra/LargeDataSetConsiderations



I actually has the opposite 'problem'. I have a pair of servers that have
been static since mid last week, but have seen performance vary
significantly (x10) for exactly the same query. I hypothesised it was
various caches so I shut down Cassandra, flushed the O/S buffer cache and
then bought it back up. The performance wasn't significantly different to
the pre-flush performance

cheers




 2012/2/13 Franc Carter franc.car...@sirca.org.au

 On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng zhangch...@jike.com wrote:

 **

 I think the keycaches and rowcahches are bothe persisted to disk when
 shutdown, and restored from disk when restart, then improve the performance.


 Thanks - that would explain at least some of what I am seeing

 cheers



 2012-02-13
 --
  zhangcheng
 --
 *发件人：* Franc Carter
 *发送时间：* 2012-02-13  13:53:56
 *收件人：* user
 *抄送：*
 *主题：* keycache persisted to disk ?

 Hi,

 I am testing Cassandra on Amazon and finding performance can vary fairly
 wildly. I'm leaning towards it being an artifact of the AWS I/O system but
 have one other possibility.

 Are keycaches persisted to disk and restored on a clean shutdown and
 restart ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

On Mon, Feb 13, 2012 at 7:21 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  I actually has the opposite 'problem'. I have a pair of servers that have
  been static since mid last week, but have seen performance vary
  significantly (x10) for exactly the same query. I hypothesised it was
  various caches so I shut down Cassandra, flushed the O/S buffer cache and
  then bought it back up. The performance wasn't significantly different to
  the pre-flush performance

 I don't get this thread at all :)

 Why would restarting with clean caches be expected to *improve*
 performance?


I was expecting it to reduce performance due to cleaning of keycache and
O/S buffer cache - performance stayed roughly the same


 And why is key cache loading involved other than to delay
 start-up and hopefully pre-populating caches for better (not worse)
 performance?

 If you want to figure out why queries seem to be slow relative to
 normal, you'll need to monitor the behavior of the nodes. Look at disk
 I/O statistics primarily (everyone reading this running Cassandra who
 aren't intimately familiar with iostat -x -k 1 should go and read up
 on it right away; make sure you understand the utilization and avg
 queue size columns), CPU usage, weather compaction is happening, etc.


Yep - I've been looking at these - I don't see anything in iostat/dstat etc
that point strongly to a problem. There is quite a bit of I/O load, but it
looks roughly uniform on slow and fast instances of the queries. The last
compaction ran 4 days ago - which was before I started seeing variable
performance



 One easy way to see sudden bursts of poor behavior is to be heavily
 reliant on cache, and then have sudden decreases in performance due to
 compaction evicting data from page cache while also generating more
 I/O.


Unlikely to be a cache issue - In one case an immediate second run of
exactly the same query performed significantly worse.



 But that's total speculation. It is also the case that you cannot
 expect consistent performance on EC2 and that might be it.


Variable performance from ec2 is my lead theory at the moment.



 But my #1 advise: Log into the node while it is being slow, and
 observe. Figure out what the bottleneck is. iostat, top, nodetool
 tpstats, nodetool netstats, nodetool compactionstats.


I now why it is slow - it's clearly I/O bound. I am trying to hunt down why
it is sometimes much faster even though I have (tried) to replicate  the
same conditions



 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

2012/2/13 R. Verlangen ro...@us2.nl

 I also noticed that, Cassandra appears to perform better under a continues
 load.

 Are you sure the rows you're quering are actually in the cache?


I'm making an assumption . . .  I don't yet know enough about cassandra to
prove they are in the cache. I have my keycache set to 2 million, and am
only querying ~900,000 keys. so after the first time I'm assuming they are
in the cache.

cheers




 2012/2/13 Franc Carter franc.car...@sirca.org.au

 2012/2/13 R. Verlangen ro...@us2.nl

 This is because of the warm up of Cassandra as it starts. On a start
 it will start fetching the rows that were cached: this will have to be
 loaded from the disk, as there is nothing in the cache yet. You can read
 more about this at
 http://wiki.apache.org/cassandra/LargeDataSetConsiderations


 I actually has the opposite 'problem'. I have a pair of servers that have
 been static since mid last week, but have seen performance vary
 significantly (x10) for exactly the same query. I hypothesised it was
 various caches so I shut down Cassandra, flushed the O/S buffer cache and
 then bought it back up. The performance wasn't significantly different to
 the pre-flush performance

 cheers




 2012/2/13 Franc Carter franc.car...@sirca.org.au

 On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng zhangch...@jike.comwrote:

 **

 I think the keycaches and rowcahches are bothe persisted to disk when
 shutdown, and restored from disk when restart, then improve the 
 performance.


 Thanks - that would explain at least some of what I am seeing

 cheers



 2012-02-13
 --
  zhangcheng
 --
 *发件人：* Franc Carter
 *发送时间：* 2012-02-13  13:53:56
 *收件人：* user
 *抄送：*
 *主题：* keycache persisted to disk ?

 Hi,

 I am testing Cassandra on Amazon and finding performance can vary
 fairly wildly. I'm leaning towards it being an artifact of the AWS I/O
 system but have one other possibility.

 Are keycaches persisted to disk and restored on a clean shutdown and
 restart ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

On Mon, Feb 13, 2012 at 7:49 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  I'm making an assumption . . .  I don't yet know enough about cassandra
 to
  prove they are in the cache. I have my keycache set to 2 million, and am
  only querying ~900,000 keys. so after the first time I'm assuming they
 are
  in the cache.

 Note that the key cache only caches the index positions in the data
 file, and not the actual data. The key cache will only ever eliminate
 the I/O that would have been required to lookup the index entry; it
 doesn't help to eliminate seeking to get the data (but as usual, it
 may still be in the operating system page cache).


Yep - I haven't enabled row caches, my calculations at the moment indicate
that the hit-ratio won't be great - but I'll be testing that later



 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

On Mon, Feb 13, 2012 at 7:48 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  Yep - I've been looking at these - I don't see anything in iostat/dstat
 etc
  that point strongly to a problem. There is quite a bit of I/O load, but
 it
  looks roughly uniform on slow and fast instances of the queries. The last
  compaction ran 4 days ago - which was before I started seeing variable
  performance

 [snip]

  I now why it is slow - it's clearly I/O bound. I am trying to hunt down
 why
  it is sometimes much faster even though I have (tried) to replicate  the
  same conditions

 What does clearly I/O bound mean, and what is quite a bit of I/O
 load?


the servers spending 50% of the time in io-wait


 In general, if you have queries that come in at some rate that
 is determined by outside sources (rather than by the time the last
 query took to execute),


That's an interesting approach - is that likely to give close to optimal
performance ?


 you will typically either get more queries
 than your cluster can take, or fewer. If fewer, there is a
 non-trivially sized grey area where overall I/O throughput needed is
 lower than that available, but the closer you are to capacity the more
 often requests have to wait for other I/O to complete, for purely
 statistical reasons.

 If you're running close to maximum capacity, it would be expected that
 the variation in query latency is high.


That may well explain it - I'll have to think about what that means for our
use case as load will be extremely bursty



 That said, if you're seeing consistently bad latencies for a while
 where you sometimes see consistently good latencies, that sounds
 different but would hopefully be observable somehow.

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

On Mon, Feb 13, 2012 at 7:51 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

 For one thing, what does ReadStage's pending look like if you
 repeatedly run nodetool tpstats on these nodes? If you're simply
 bottlenecking on I/O on reads, that is the most easy and direct way to
 observe this empirically. If you're saturated, you'll see active close
 to maximum at all times, and pending racking up consistently. If
 you're just close, you'll likely see spikes sometimes.


Yep, the readstage is backlogging consistently - but the thing I am trying
to explain s why it is good sometimes in an environment that is pretty well
controlled - other than being on ec2





 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

On Mon, Feb 13, 2012 at 8:00 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

 What is your total data size (nodetool info/nodetool ring) per node,
 your heap size, and the amount of memory on the system?


2 Node cluster, 7.9GB of ram (ec2 m1.large)
RF=2
11GB per node
Quorum reads
122 million keys
heap size is 1867M (default from the AMI I am running)
I'm reading about 900k keys

As I was just going through cfstats - I noticed something I don't understand

Key cache capacity: 906897
Key cache size: 906897

I set the key cache to 2million, it's somehow got to a rather odd number




 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

On Mon, Feb 13, 2012 at 8:09 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  the servers spending 50% of the time in io-wait

 Note that I/O wait is not necessarily a good indicator, depending on
 situation. In particular if you have multiple drives, I/O wait can
 mostly be ignored. Similarly if you have non-trivial CPU usage in
 addition to disk I/O, it is also not a good indicator. I/O wait is
 essentially giving you the amount of time CPU:s spend doing nothing
 because the only processes that would otherwise be runnable are
 waiting on disk I/O. But even a single process waiting on disk I/O -
 lots of I/O wait even if you have 24 drives.


Yep - user space cpu is 20% or much worse when the io-wait goes in to the
90's - looks a great deal like IO bottleknecks



 The per-disk % utilization is generally a much better indicator
 (assuming no hardware raid device, and assuming no SSD), along with
 the average queue size.


I doubt that figure is available sensibly in an ec2 instance



  In general, if you have queries that come in at some rate that
  is determined by outside sources (rather than by the time the last
  query took to execute),
 
  That's an interesting approach - is that likely to give close to optimal
  performance ?

 I just mean that it all depends on the situation. If you have, for
 example, some N number of clients that are doing work as fast as they
 can, bottlenecking only on Cassandra, you're essentially saturating
 the Cassandra cluster no matter what (until the client/network becomes
 a bottleneck). Under such conditions (saturation) you generally never
 should expect good latencies.

 For most non-batch job production use-cases, you tend to have incoming
 requests driven by something external such as user behavior or
 automated systems not related to the Cassandra cluster. In this cases,
 you tend to have a certain amount of incoming requests at any given
 time that you must serve within a reasonable time frame, and that's
 where the question comes in of how much I/O you're doing in relation
 to maximum. For good latencies, you always want to be significantly
 below maximum - particularly when platter based disk I/O is involved.

  That may well explain it - I'll have to think about what that means for
 our
  use case as load will be extremely bursty

 To be clear though, even your typical un-bursty load is still bursty
 once you look at it at sufficient resolution, unless you have
 something specifically ensuring that it is entirely smooth. A
 completely random distribution over time for example would look very
 even on almost any graph you can imagine unless you have sub-second
 resolution, but would still exhibit un-evenness and have an affect on
 latency.

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter

On Mon, Feb 13, 2012 at 8:15 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  2 Node cluster, 7.9GB of ram (ec2 m1.large)
  RF=2
  11GB per node
  Quorum reads
  122 million keys
  heap size is 1867M (default from the AMI I am running)
  I'm reading about 900k keys

 Ok, so basically a very significant portion of the data fits in page
 cache, but not all.


yep



  As I was just going through cfstats - I noticed something I don't
 understand
 
  Key cache capacity: 906897
  Key cache size: 906897
 
  I set the key cache to 2million, it's somehow got to a rather odd number

 You're on 1.0 +?


yep 1.07


 Nowadays there is code to actively make caches
 smaller if Cassandra detects that you seem to be running low on heap.
 Watch cassandra.log for messages to that effect (don't remember the
 exact message right now).


I just grep'd the logs and couldn't see anything that looked like that


  --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: active/pending queue lengths

2012-02-13 Thread Franc Carter

On Tue, Feb 14, 2012 at 6:06 AM, aaron morton aa...@thelastpickle.comwrote:

 What CL are you reading at ?


Quorum



 Write ops go to RF number of nodes, read ops go to RF number of nodes 10%
 (the default probability that Read Repair will be running) of the time and
 CL number of nodes 90% of the time. With 2 nodes and RF 2 the QUOURM is 2,
 every request will involve all nodes.


Yep, the thing tat confuses is the different behaviour for reading from one
node versus two



 As to why the pending list gets longer, do you have some more info ? What
 process are you using to measure ? It's hard to guess why. In this setup
 every node will have the data and should be able to do a local read and
 then on the other node.


I have four pycassa clients, two making requests to one server and two
making requests to the other (or all four making requests to the same
server). The requested keys don't overlap and I would expect/assume the
keys are in the keycache

I am looking at the output of nodetool -h tpstats

cheers


 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 14/02/2012, at 12:47 AM, Franc Carter wrote:


 Hi,

 I've been looking at tpstats as various test queries run and I noticed
 something I don't understand.

 I have a two node cluster with RF=2 on which I run 4 parallel queries,
 each job goes through a list of keys doing a multiget for 2 keys at a time.
 If two of the queries go to one node and the other two go to a different
 node then the pending queue on the node gets much longer than if they all
 go to the one node.

 I'm clearly missing something here as I would have expected the opposite

 cheers

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

keycache persisted to disk ?

2012-02-12 Thread Franc Carter

Hi,

I am testing Cassandra on Amazon and finding performance can vary fairly
wildly. I'm leaning towards it being an artifact of the AWS I/O system but
have one other possibility.

Are keycaches persisted to disk and restored on a clean shutdown and
restart ?

cheers

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: keycache persisted to disk ?

2012-02-12 Thread Franc Carter

On Mon, Feb 13, 2012 at 5:03 PM, zhangcheng zhangch...@jike.com wrote:

 **

 I think the keycaches and rowcahches are bothe persisted to disk when
 shutdown, and restored from disk when restart, then improve the performance.


Thanks - that would explain at least some of what I am seeing

cheers



 2012-02-13
 --
  zhangcheng
 --
 *发件人：* Franc Carter
 *发送时间：* 2012-02-13  13:53:56
 *收件人：* user
 *抄送：*
 *主题：* keycache persisted to disk ?

 Hi,

 I am testing Cassandra on Amazon and finding performance can vary fairly
 wildly. I'm leaning towards it being an artifact of the AWS I/O system but
 have one other possibility.

 Are keycaches persisted to disk and restored on a clean shutdown and
 restart ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 9236 9118

 Level 9, 80 Clarence St, Sydney NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215




-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

Re: sensible data model ?

2012-02-07 Thread Franc Carter

On Wed, Feb 8, 2012 at 6:05 AM, aaron morton aa...@thelastpickle.comwrote:

None of those jump out at me as horrible for my case. If I modelled with
Super Columns I would have less than 10,000 Super Columns with an average
of 50 columns - big but no insane ?

I would still try to do it without super columns. The common belief is
they are about 10% slower, and they are a lot clunkier. There are some
query and delete cases where they do things composite columns cannot, but
in general I try to model things without using them first.

Ok - it seems cleaner to model without them to me as well.

Because of request overhead ? I'm currently using the batch interface of
pycassa to do bulk reads. Is the same problem going to bite me if I have
many clients reading (using bulk reads) ? In production we will have ~50
clients.

pycassa has support for chunking requests to the server
https://github.com/pycassa/pycassa/blob/master/pycassa/columnfamily.py#L633

It's because each row requested becomes a read task on the server and is
placed into the read thread pool. There are only 32 (default) read thread
in the pool. If one query comes along and requests 100 rows, it places 100
tasks in the thread pool where only 32 can be processed at a time. Some
will back up as pending tasks and eventually be processed. If row reads
reads take 1ms (just to pick a number, may be better) to read 100 rows we
are talking about 3 or 4ms for that query. During that time any read
requests received will have to wait for read threads.

To that client this is excellent, it's has a high row throughput. To the
other clients this is not, overall query throughput will drop. More is not
always better. Note that as the number of nodes increases and this effect
is may be reduced as reading 100 rows may result in the coordinator sending
25 row requests to 4 nodes.

And there is also overhead involved in very big requests, see…

http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Read-Latency-td5636553.html#a5652476

thanks

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/02/2012, at 2:28 PM, Franc Carter wrote:

On Tue, Feb 7, 2012 at 6:39 AM, aaron morton aa...@thelastpickle.comwrote:

Sounds like a good start. Super columns are not a great fit for modeling
time series data for a few reasons, here is one
http://wiki.apache.org/cassandra/CassandraLimitations

None of those jump out at me as horrible for my case. If I modelled with
Super Columns I would have less than 10,000 Super Columns with an average
of 50 columns - big but no insane ?

It's also a good idea to partition time series data so that the rows do
not grow too big. You can have 2 billion columns in a row, but big rows
have operational down sides.

You could go with either:

rows: entity_id:date
column: property_name

Which would mean each time your query for a date range you need to query
multiple rows. But it is possible to get a range of columns / properties.

rows: entity_id:time_partition
column: date:property_name

That's an interesting idea - I'll talk to the data experts to see if we
have a sensible range.

Where time_partition is something that makes sense in your problem
domain, e.g. a calendar month. If you often query for days in a month you
can then get all the columns for the days you are interested in (using a
column range). If you only want to get a sub set of the entity properties
you will need to get them all and filter them client side, depending on the
number and size of the properties this may be more efficient than multiple
calls.

I'm find with doing work on the client side - I have a bias in that
direction as it tends to scale better.

One word of warning, avoid sending read requests for lots (i.e. 100's) of
rows at once it will reduce overall query throughput. Some clients like
pycassa take care of this for you.

thanks

Good luck.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/02/2012, at 12:12 AM, Franc Carter wrote:

Hi,

I'm pretty new to Cassandra and am currently doing a proof of concept,
and thought it would be a good idea to ask if my data model is sane . . .

The data I have, and need to query, is reasonably simple. It consists of
about 10 million entities, each of which have a set of key/value properties
for each day for about 10 years. The number of keys is in the 50-100 range
and there will be a lot of overlap for keys in entity,days

The queries I need to make are for sets of key/value properties for an
entity on a day, e.g key1,keys2,key3 for 10 entities on 20 days. The number

Re: sensible data model ?

2012-02-06 Thread Franc Carter

On Tue, Feb 7, 2012 at 6:39 AM, aaron morton aa...@thelastpickle.comwrote:

 Sounds like a good start. Super columns are not a great fit for modeling
 time series data for a few reasons, here is one
 http://wiki.apache.org/cassandra/CassandraLimitations



None of those jump out at me as horrible for my case. If I modelled with
Super Columns I would have less than 10,000 Super Columns with an average
of 50 columns - big but no insane ?



 It's also a good idea to partition time series data so that the rows do
 not grow too big. You can have 2 billion columns in a row, but big rows
 have operational down sides.

 You could go with either:

 rows: entity_id:date
 column: property_name

 Which would mean each time your query for a date range you need to query
 multiple rows. But it is possible to get a range of  columns / properties.

 Or

 rows: entity_id:time_partition
 column: date:property_name


That's an interesting idea - I'll talk to the data experts to see if we
have a sensible range.



 Where time_partition is something that makes sense in your problem domain,
 e.g. a calendar month. If you often query for days in a month you  can then
 get all the columns for the days you are interested in (using a column
 range). If you only want to get a sub set of the entity properties you will
 need to get them all and filter them client side, depending on the number
 and size of the properties this may be more efficient than multiple calls.


I'm find with doing work on the client side - I have a bias in that
direction as it tends to scale better.



 One word of warning, avoid sending read requests for lots (i.e. 100's) of
 rows at once it will reduce overall query throughput. Some clients like
 pycassa take care of this for you.


Because of request overhead ? I'm currently using the batch interface of
pycassa to do bulk reads. Is the same problem going to bite me if I have
many clients reading (using bulk reads) ? In production we will have ~50
clients.

thanks


 Good luck.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5/02/2012, at 12:12 AM, Franc Carter wrote:


 Hi,

 I'm pretty new to Cassandra and am currently doing a proof of concept, and
 thought it would be a good idea to ask if my data model is sane . . .

 The data I have, and need to query, is reasonably simple. It consists of
 about 10 million entities, each of which have a set of key/value properties
 for each day for about 10 years. The number of keys is in the 50-100 range
 and there will be a lot of overlap for keys in entity,days

 The queries I need to make are for sets of key/value properties for an
 entity on a day, e.g key1,keys2,key3 for 10 entities on 20 days. The number
 of entities and/or days in the query could be either very small or very
 large.

 I've modeled this with a simple column family for the keys with the row
 key being the concatenation of the entity and date. My first go, used only
 the entity as the row key and then used a supercolumn for each date. I
 decided against this mostly because it seemed more complex for a gain I
 didn't really understand.

 Does this seem sensible ?

 thanks

 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

sensible data model ?

2012-02-04 Thread Franc Carter

Hi,

I'm pretty new to Cassandra and am currently doing a proof of concept, and
thought it would be a good idea to ask if my data model is sane . . .

The data I have, and need to query, is reasonably simple. It consists of
about 10 million entities, each of which have a set of key/value properties
for each day for about 10 years. The number of keys is in the 50-100 range
and there will be a lot of overlap for keys in entity,days

The queries I need to make are for sets of key/value properties for an
entity on a day, e.g key1,keys2,key3 for 10 entities on 20 days. The number
of entities and/or days in the query could be either very small or very
large.

I've modeled this with a simple column family for the keys with the row key
being the concatenation of the entity and date. My first go, used only the
entity as the row key and then used a supercolumn for each date. I decided
against this mostly because it seemed more complex for a gain I didn't
really understand.

Does this seem sensible ?

thanks

-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215

79 matches

Mail list logo