Re: Cassandra read optimization

2012-04-19 Thread Dan Feldman
Hi Tyler and Aaron, Thanks for your replies. Tyler, fetching scs using your pycassa script on our server takes ~7 s - consistent with the times we've been seeing. Now, we aren't really experts in Cassandra, but it seems that JNA is enabled by default for Cassandra 1.0 according to Jeremy (

Re: Cassandra read optimization

2012-04-19 Thread Paolo Bernardi
Look into your Cassandra's logs to see if JNA is really enabled (it really should be, by default), and more importantly if JNA is loaded correctly. You might find some surprising message over there: if this is the case, just install JNA with your distro's package manager and, if still doesn't

Re: Cassandra read optimization

2012-04-19 Thread Dan Feldman
Hi Paolo, Thanks for the hint - JNA indeed wasn't installed. However, now that cassandra is actually using it, there doesn't seem to be any change in terms of speed - still 7 seconds with pycassa. On Thu, Apr 19, 2012 at 12:14 AM, Paolo Bernardi berna...@gmail.com wrote: Look into your

RE: blob fields, bynary or hexa?

2012-04-19 Thread mdione.ext
De : phuduc nguyen [mailto:duc.ngu...@pearson.com] How are you passing a blob or binary stream to the CLI? It sounds like you're passing in a representation of a binary stream as ascii/UTF8 which will create the problems you describe. So this is only a limitation of Cassandra-cli? -- Marcos

By passing Socket communication

2012-04-19 Thread Tarun Gupta
Hi, I am interesting in knowing what is the best way to create my Cassandra Client bypassing the Socket communication and directly interacting with the 'Storage Manager'. I checked Cassandra Wiki and some of the Hector Examples, mostly what I see is that Cassandra when run in embedded mode,

Re: User authorized for modify-keyspace cannot create CFs

2012-04-19 Thread aaron morton
What version are you on ? AFAIK the SimpleAuthenticator, and to some degree authentication (?), has been essentially deprecated as it was considered incomplete and was not under development. This is why the SimpleAuthenticator was moved out to the examples directory in 1.X. I doubt it will be

Re: exceptions after upgrading from 1.0.7 to 1.0.9

2012-04-19 Thread aaron morton
try this http://www.datastax.com/docs/1.0/install/upgrading#upgrading-between-minor-releases-of-cassandra-1-0-x Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/04/2012, at 3:02 AM, Tamar Fraenkel wrote: Thanks!!! Two simple actions

Re: Multi Master replication : rejoining a node after split network

2012-04-19 Thread aaron morton
For background: http://www.datastax.com/docs/1.0/cluster_architecture/index http://thelastpickle.com/2011/02/07/Introduction-to-Cassandra/ Which mechanism is used to replicate the changes from one system to another: statement distribution or recording the changeset via triggers or storing the

Re: By passing Socket communication

2012-04-19 Thread Watanabe Maki
You can get some idea from reading org.apache.cassandra.thrift.CassandraServer.java, but I wonder what kind of use case will justify such effort. From iPhone On 2012/04/19, at 18:17, Tarun Gupta tarun.gu...@technogica.com wrote: Hi, I am interesting in knowing what is the best way to

Re: Single Vs. Multiple Keyspaces

2012-04-19 Thread aaron morton
I would suggest you build one cluster, using all your nodes, and create one keyspace for all users. There are lots of reasons, here a few: * many nodes in a single clusters spreads the load and gives you fault tolerance. * read and write requests can be distributed in a many node cluster. *

Re: RMI/JMX errors, weird

2012-04-19 Thread aaron morton
At some point the gossip system on the node this log is from decided that 130.199.185.195 was DOWN. This was based on how often the node was gossiping to the cluster. The active repair session was informed. And to avoid failing the job unnecessarily it tested that the errant nodes phi value

Re: Multi Master replication : rejoining a node after split network

2012-04-19 Thread Romain HARDOUIN
As timestamps are set by clients, a common gotcha is to have all or some clients which are not synchronised by NTP.

200TB in Cassandra ?

2012-04-19 Thread Franc Carter
Hi, One of the projects I am working on is going to need to store about 200TB of data - generally in manageable binary chunks. However, after doing some rough calculations based on rules of thumb I have seen for how much storage should be on each node I'm worried. 200TB with RF=3 is 600TB =

Re: Multi Master replication : rejoining a node after split network

2012-04-19 Thread Samba
Thanks Aaron and Romain, very useful information indeed; and yes there is no alternative to personally trying out and dirtying our hands. Regards, Samba

Re: Cassandra read optimization

2012-04-19 Thread aaron morton
Here's a test I did a while ago about creating column objects in python http://www.mail-archive.com/user@cassandra.apache.org/msg06729.html As Tyler said, the best approach is to limit the size of the slices. If are are trying to load 125K super columns with 25 columns each your are asking

RE 200TB in Cassandra ?

2012-04-19 Thread Romain HARDOUIN
Cassandra supports data compression and depending on your data, you can gain a reduction in data size up to 4x. 600 TB is a lot, hence requires lots of servers... Franc Carter franc.car...@sirca.org.au a écrit sur 19/04/2012 13:12:19 : Hi, One of the projects I am working on is going to

Re: RE 200TB in Cassandra ?

2012-04-19 Thread Franc Carter
On Thu, Apr 19, 2012 at 9:38 PM, Romain HARDOUIN romain.hardo...@urssaf.frwrote: Cassandra supports data compression and depending on your data, you can gain a reduction in data size up to 4x. The data is gzip'd already ;-) 600 TB is a lot, hence requires lots of servers... Franc

Re: exceptions after upgrading from 1.0.7 to 1.0.9

2012-04-19 Thread Tamar Fraenkel
Thanks. This was the one I followed :) Wonder if there is something more detailed... *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Thu, Apr 19, 2012 at 1:06 PM, aaron

Re: 200TB in Cassandra ?

2012-04-19 Thread John Doe
Franc Carter franc.car...@sirca.org.au One of the projects I am working on is going to need to store about 200TB of data - generally in manageable binary chunks. However, after doing some rough calculations based on rules of thumb I have seen for how much storage should be on each node I'm

Re: RE 200TB in Cassandra ?

2012-04-19 Thread Yiming Sun
600 TB is really a lot, even 200 TB is a lot. In our organization, storage at such scale is handled by our storage team and they purchase specialized (and very expensive) equipment from storage hardware vendors because at this scale, performance and reliability is absolutely critical. but it

Re: 200TB in Cassandra ?

2012-04-19 Thread Franc Carter
On Thu, Apr 19, 2012 at 10:07 PM, John Doe jd...@yahoo.com wrote: Franc Carter franc.car...@sirca.org.au One of the projects I am working on is going to need to store about 200TB of data - generally in manageable binary chunks. However, after doing some rough calculations based on rules of

Re: RE 200TB in Cassandra ?

2012-04-19 Thread Franc Carter
On Thu, Apr 19, 2012 at 10:16 PM, Yiming Sun yiming@gmail.com wrote: 600 TB is really a lot, even 200 TB is a lot. In our organization, storage at such scale is handled by our storage team and they purchase specialized (and very expensive) equipment from storage hardware vendors because

Re: RE 200TB in Cassandra ?

2012-04-19 Thread Nigel Kerr
Can you say more about how and how often these 200TB get used, queried, updated? Is a different usage profile needed? What kind of column families do you have in mind for them? On Thu, Apr 19, 2012 at 8:24 AM, Franc Carter franc.car...@sirca.org.auwrote: On Thu, Apr 19, 2012 at 10:16 PM,

Re: 200TB in Cassandra ?

2012-04-19 Thread Dave Brosius
I think your math is 'relatively' correct. It would seem to me you should focus on how you can reduce the amount of storage you are using per item, if at all possible, if that node count is prohibitive. On 04/19/2012 07:12 AM, Franc Carter wrote: Hi, One of the projects I am working on is

Re: By passing Socket communication

2012-04-19 Thread Romain HARDOUIN
Take a peep at cassandra-unit, maybe this could help you : https://github.com/jsevellec/cassandra-unit

Re: blob fields, bynary or hexa?

2012-04-19 Thread phuduc nguyen
Well, I'm not sure exactly how you're passing a blob to the CLI. It would be helpful if you pasted your commands/code and maybe there is a simple oversight. With that said, Cassandra can most definitely save blob/binary values. I think most people use a high level client; we use Hector. If

Re: blob fields, bynary or hexa?

2012-04-19 Thread R. Verlangen
PHPCassa does support binaries, so that should not be the problem. 2012/4/19 phuduc nguyen duc.ngu...@pearson.com Well, I'm not sure exactly how you're passing a blob to the CLI. It would be helpful if you pasted your commands/code and maybe there is a simple oversight. With that said,

migrating from SimpleStrategy to NetworkTopologyStrategy

2012-04-19 Thread simojenki
Hi, Is there any documentation on what the procedure for migrating from SimpleStrategy to NetworkTopologyStrategy? thanks Simon

Write Performance

2012-04-19 Thread Trevor Francis
Would there be any reason why I can't write more than 875 writes/sec to a cluster of 2 cassandra boxes? They are quad core machines with 8gb of ram running raid 10, so not huge servers….but certainly enough to handle a much larger load than that. We are feeding data into it through a Flume

Two Random Ports in Private port range

2012-04-19 Thread W F
Hi All, I did a web search of the archives (hope I looked in the right place) and could not find a request like this. When Cassandra is running, it seems to create to random tcp listen ports. For example: 50378 and 58692, 49952, 52792. What are are these for and is there documentation

default required in cassandra-topology.properties?

2012-04-19 Thread Bill Au
All the examples of cassandra-topology.properties that I have seen have a default entry assigning unknown nodes to a specific data center and rack. Is it possible to have Cassandra ignore unknown nodes for the purpose of replication? Bill

Re: Cassandra read optimization

2012-04-19 Thread Dan Feldman
We'll try doing multithreaded requests today-tomorrow As for tuning down the number of supercolumns per slice, I tried doing that, but I've noticed that the time was decreasing linearly with the length of the slice. So, grabbing 1000 per slice would take 1/5 as long as 5000, but i'll have to make

Re: migrating from SimpleStrategy to NetworkTopologyStrategy

2012-04-19 Thread Marcus Both
I think that is enough to do an update on keyspace, for example (cassandra-cli): update keyspace KEYSPACE with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = {datacenter1: 1}; On Thu, 19 Apr 2012 16:18:46 +0100 simojenki simoje...@gmail.com

High Log Storage

2012-04-19 Thread Trevor Francis
I have a web application that generates multiple log files in a log file directory. On a particularly chatty box, up to 2000 entries per second are written to those log files. We are looking for a solution to tail that directory and insert new entries into a cassandra db. The fields in the

RE: default required in cassandra-topology.properties?

2012-04-19 Thread Richard Lowe
Yes it is possible. Put the following as the last line of your topology file: default=unknown:unknown So long as you don't have any DC or rack with this name your local node will not be able to address any nodes that aren't explicitly given in its topology file. However bear in mind that,

Re: High Log Storage

2012-04-19 Thread bill
Try writing them through Kafka. It should that load. Bill Sent from my BlackBerry® wireless handheld -Original Message- From: Trevor Francis trevor.fran...@tgrahamcapital.com Date: Thu, 19 Apr 2012 12:04:19 To: user@cassandra.apache.org Reply-To: user@cassandra.apache.org Subject: High

Re: default required in cassandra-topology.properties?

2012-04-19 Thread Bill Au
I had thought that the topology file is used for replicas placement only such that for the token range that the unknown node is responsible for, data is still read and write there. It just won't be replicated since replication factor is not defined. Bill On Thu, Apr 19, 2012 at 1:18 PM, Richard

Re: 200TB in Cassandra ?

2012-04-19 Thread aaron morton
Couple of ideas: * take a look at compression in 1.X http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression * is there repetition in the binary data ? Can you save space by implementing content addressable storage ? Cheers - Aaron Morton Freelance Developer

Re: Write Performance

2012-04-19 Thread aaron morton
You should be able to get more than that. Run nodetool cfstats, look at the Write Latency (this is the recent latency, i.e. is reset each time you run it). This will give you an idea of how long an individual node is spending on a write. Fire up JConsole, go to the StorageProxy MBean and

Re: Cassandra read optimization

2012-04-19 Thread aaron morton
but i'll have to make 5 times as many requests to the database 5 times a small number can be less than 1 big number :) see http://wiki.apache.org/cassandra/HadoopSupport It's also covered in the O'Reilly cassandra book, however that book is somewhat out of date. also search for posts from

Re: migrating from SimpleStrategy to NetworkTopologyStrategy

2012-04-19 Thread aaron morton
There is this, it's old.. http://wiki.apache.org/cassandra/Operations#Replication There was also a discussion about it in the last month or so. i *think* it's ok so long as you move to a single DC and single rack. But please test. Cheers - Aaron Morton Freelance Developer

RE: DataStax Opscenter 2.0 question

2012-04-19 Thread Jay Parashar
Firefox version 3.6.10. on Ubuntu 10.10. Let me update it and try. Thanks Nick! Will let you know. -Original Message- From: Nick Bailey [mailto:n...@datastax.com] Sent: Wednesday, April 18, 2012 4:56 PM To: user@cassandra.apache.org Subject: Re: DataStax Opscenter 2.0 question What

RE: DataStax Opscenter 2.0 question

2012-04-19 Thread Jay Parashar
Thanks Nick, that was it. With Firefox 11, it works. -Original Message- From: Nick Bailey [mailto:n...@datastax.com] Sent: Wednesday, April 18, 2012 4:56 PM To: user@cassandra.apache.org Subject: Re: DataStax Opscenter 2.0 question What version of firefox? Someone has reported a similar

Re: migrating from SimpleStrategy to NetworkTopologyStrategy

2012-04-19 Thread Ravikumar Govindarajan
We tried this route previously. We did not run repair at all {our use-cases don't need a repair} but while adding a secondary data center, we were forced to run repair. It ended up exploding the data. We finally had to start afresh, scrapped the cluster and re-import the data with NTS. Now,