Re: Facebook messaging and choice of HBase over Cassandra - what can we learn?

2010-11-21 Thread Edward Capriolo
On Sun, Nov 21, 2010 at 12:10 PM, André Fiedler fiedler.an...@googlemail.com wrote: Facebook Messaging – HBase Comes of Age http://facility9.com/2010/11/18/facebook-messaging-hbase-comes-of-age 2010/11/21 David Boxenhorn da...@lookin2.com Eventual consistency is not good enough for instant

Re: Cassandra memtable and GC

2010-11-22 Thread Edward Capriolo
On Mon, Nov 22, 2010 at 8:28 AM, Shotaro Kamio kamios...@gmail.com wrote: Hi Peter, I've tested again with recording LiveSSTableCount and MemtableDataSize via jmx. I guess this result supports my suspect on memtable performance because I cannot find Full GC this time. This is a result in

Re: cassandra vs hbase summary (was facebook messaging)

2010-11-22 Thread Edward Capriolo
On Mon, Nov 22, 2010 at 2:52 PM, Todd Lipcon t...@lipcon.org wrote: On Mon, Nov 22, 2010 at 10:01 AM, David Jeske dav...@gmail.com wrote: I havn't used either Cassandra or hbase, so please don't take any part of this message as me attempting to state facts about either system. However, I'm

Re: cassandra vs hbase summary (was facebook messaging)

2010-11-22 Thread Edward Capriolo
On Mon, Nov 22, 2010 at 2:56 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Mon, Nov 22, 2010 at 2:52 PM, Todd Lipcon t...@lipcon.org wrote: On Mon, Nov 22, 2010 at 10:01 AM, David Jeske dav...@gmail.com wrote: I havn't used either Cassandra or hbase, so please don't take any part

Re: cassandra vs hbase summary (was facebook messaging)

2010-11-22 Thread Edward Capriolo
On Mon, Nov 22, 2010 at 5:14 PM, Todd Lipcon t...@lipcon.org wrote: On Mon, Nov 22, 2010 at 1:58 PM, David Jeske dav...@gmail.com wrote: On Mon, Nov 22, 2010 at 11:52 AM, Todd Lipcon t...@lipcon.org wrote: Not quite. The replica synchronization code is pretty messy, but basically it will

Re: cassandra vs hbase summary (was facebook messaging)

2010-11-22 Thread Edward Capriolo
On Mon, Nov 22, 2010 at 5:48 PM, David Jeske dav...@gmail.com wrote: On Mon, Nov 22, 2010 at 2:44 PM, David Jeske dav...@gmail.com wrote: On Mon, Nov 22, 2010 at 2:39 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Return messages such as your data was written to at least 1 node

Re: monitoring read and write problems via log file?

2010-11-24 Thread Edward Capriolo
On Wed, Nov 24, 2010 at 3:04 AM, Peter Schuller peter.schul...@infidyne.com wrote: I was told by a colleague that read and write problems in Cassandra can be detected by monitoring a Cassandra log file. What do you mean by problem? If you mean something like a hard I/O error or corruption

Re: Capacity problem with a lot of writes?

2010-11-26 Thread Edward Capriolo
On Fri, Nov 26, 2010 at 10:49 AM, Peter Schuller peter.schul...@infidyne.com wrote: Making compaction parallel isn't a priority because the problem is almost always the opposite: how do we spread it out over a longer period of time instead of sharp spikes of activity that hurt read/write

Re: Using mySQL to emulate Cassandra

2010-11-28 Thread Edward Capriolo
On Sun, Nov 28, 2010 at 11:35 AM, Tom Melendez t...@supertom.com wrote: On Sun, Nov 28, 2010 at 12:28 AM, David Boxenhorn da...@lookin2.com wrote: As our launch date approaches, I am getting increasingly nervous about Cassandra tuning. It is a mysterious black art that I haven't mastered even

Re: get_count - cassandra 0.7.x predicate limit bug?

2010-11-30 Thread Edward Capriolo
On Tue, Nov 30, 2010 at 1:00 AM, Tyler Hobbs ty...@riptano.com wrote: What error are you getting? Remember, get_count() is still just about as much work for cassandra as getting the whole row; the only advantage is it doesn't have to send the whole row back to the client. If you're counting

Re: how to see how many rows in each node?

2010-12-03 Thread Edward Capriolo
On Fri, Dec 3, 2010 at 12:53 PM, Robert Coli rc...@digg.com wrote: On 12/3/10 6:09 AM, Jonathan Ellis wrote: Divide space used by average row size from cfstats On Fri, Dec 3, 2010 at 7:58 AM, Donal Zangzan...@ihep.ac.cn  wrote: RT. Is there any command or api? In 0.6.x : strings

Running multiple instances on a single server --micrandra ??

2010-12-07 Thread Edward Capriolo
I am quite ready to be stoned for this thread but I have been thinking about this for a while and I just wanted to bounce these ideas of some guru's. Cassandra does allow multiple data directories, but as far as I can tell no one runs in this configuration. This is something that is very

Re: Running multiple instances on a single server --micrandra ??

2010-12-10 Thread Edward Capriolo
On Thu, Dec 9, 2010 at 10:40 PM, Bill de hÓra b...@dehora.net wrote: On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote: The idea behind micrandra is for a 6 disk system run 6 instances of Cassandra, one per disk. Use the RackAwareSnitch to make sure no replicas live on the same node

Re: Running multiple instances on a single server --micrandra ??

2010-12-10 Thread Edward Capriolo
On Fri, Dec 10, 2010 at 11:39 PM, Edward Capriolo edlinuxg...@gmail.com wrote: On Thu, Dec 9, 2010 at 10:40 PM, Bill de hÓra b...@dehora.net wrote: On Tue, 2010-12-07 at 21:25 -0500, Edward Capriolo wrote: The idea behind micrandra is for a 6 disk system run 6 instances of Cassandra, one

Re: N to N relationships

2010-12-12 Thread Edward Capriolo
On Sun, Dec 12, 2010 at 3:20 AM, David Boxenhorn da...@lookin2.com wrote: You want to store every value twice? That would be a pain to maintain, and possibly lead to inconsistent data. On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey n...@riptano.com wrote: I would also recommend two column

Re: unable to start cassandra-0.7r2

2010-12-13 Thread Edward Capriolo
On Mon, Dec 13, 2010 at 5:45 PM, Eric Evans eev...@rackspace.com wrote: On Mon, 2010-12-13 at 17:27 -0500, Liangzhao Zeng wrote: I can run the 0.66 using same logging setup without any problem. Not sure what's the difference when starting up the 0.7 in eclipse. Can someone share the logging

Re: Read Latency Degradation

2010-12-17 Thread Edward Capriolo
On Fri, Dec 17, 2010 at 8:21 AM, Wayne wav...@gmail.com wrote: We have been testing Cassandra for 6+ months and now have 10TB in 10 nodes with rf=3. It is 100% real data generated by real code in an almost production level mode. We have gotten past all our stability issues, java/cmf issues,

Re: Cassandra Monitoring

2010-12-17 Thread Edward Capriolo
On Fri, Dec 17, 2010 at 5:48 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Hi all just wanted to share a simple way we use to monitor cassandra internals with zabbix. We use a minimal http server which reads jmx and shows returns them in a property form. Thats read by zabbix every

Re: Read Latency Degradation

2010-12-17 Thread Edward Capriolo
, 587k filter, 18meg index for the other. Thanks. On Fri, Dec 17, 2010 at 10:58 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Fri, Dec 17, 2010 at 8:21 AM, Wayne wav...@gmail.com wrote: We have been testing Cassandra for 6+ months and now have 10TB in 10 nodes with rf=3

Re: Which Java on Fedora? Sun's or GNU's?

2010-12-29 Thread Edward Capriolo
On Wed, Dec 29, 2010 at 11:29 AM, Eric Evans eev...@rackspace.com wrote: On Wed, 2010-12-29 at 10:56 -0500, Edward Capriolo wrote: Cassandra pushes your JVM hard. Do not count on your distro which might provide versions of things that are 3 months to 2 years old. Come on.  If it worked fine 3

Re: The size of the data, I must be doing smth wrong....

2011-01-05 Thread Edward Capriolo
On Wed, Jan 5, 2011 at 9:52 AM, Jonathan Ellis jbel...@gmail.com wrote: It's normal for Cassandra to use more disk space than MySQL.  It's part of what we trade for not having to rewrite every row when you add a new column. SSTables that are obsoleted by a compaction are deleted

Re: Is this a good schema design to implement a social application..

2011-01-08 Thread Edward Capriolo
On Fri, Jan 7, 2011 at 11:38 PM, Rajkumar Gupta rajkumar@gmail.com wrote: In the twissandra example, http://www.riptano.com/docs/0.6/data_model/twissandra#adding-friends , I find that they have split the materialized view of a user's homepage (like his followers list, tweets from friends)

Re: Welcome committer Jake Luciani

2011-01-13 Thread Edward Capriolo
Three cheers! On Thu, Jan 13, 2011 at 1:45 PM, Jake Luciani jak...@gmail.com wrote: Thanks Jonathan and Cassandra PMC! Happy to help Cassandra take over the world! -Jake On Thu, Jan 13, 2011 at 1:41 PM, Jonathan Ellis jbel...@gmail.com wrote: The Cassandra PMC has voted to add Jake as a

Re: cassandra row cache

2011-01-13 Thread Edward Capriolo
Is it possible that your are reading at READ.ONE and that READ.ONE only warms cache on 1 of your three nodes= 20. 2nd read warms another 60%, and by the third read all the replicas are warm? 99% ? This would be true if digest reads were not warming caches. Edward On Thu, Jan 13, 2011 at 4:07

Re: about the data directory

2011-01-13 Thread Edward Capriolo
On Thu, Jan 13, 2011 at 7:56 PM, raoyixuan (Shandy) raoyix...@huawei.com wrote: I have some confused, why do the users can read the data in all nodes? I mean the data just be kept in the replica, how to achieve it? -Original Message- From: sc...@scode.org [mailto:sc...@scode.org] On

Re: live data migration from mysql to cassandra

2011-01-14 Thread Edward Capriolo
On Fri, Jan 14, 2011 at 10:40 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Hello Dear community please share your experience, home you make live(without stop) migration from mysql or other RDBM to cassandra There is no built in way to do this. I remember hearing at hadoop world this year

Re: Cassandra in less than 1G of memory?

2011-01-14 Thread Edward Capriolo
On Fri, Jan 14, 2011 at 2:13 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Dear rajat, Yes it is possible, I have the same constraints. However I must warn you, from what I see Cassandra memory consumption is not bounded in 0.6.X on debian 64 Bit Here is an example of an instance

Re: balancing load

2011-01-16 Thread Edward Capriolo
On Sun, Jan 16, 2011 at 11:45 AM, Karl Hiramoto k...@hiramoto.org wrote: Hi, I have a keyspace with  Replication Factor: 2 and it seems though that most of my data goes to one node. What am I missing to have Cassandra balance more evenly? ./nodetool  -h host1 ring Address         Status

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 2:44 AM, aaron morton aa...@thelastpickle.com wrote: The nodes will not automatically delete stale data, to do that you need to run nodetool cleanup. See step 3 in the Range Changes Bootstrap http://wiki.apache.org/cassandra/Operations#Range_changes If you are

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 10:51 AM, Peter Schuller peter.schul...@infidyne.com wrote: Just to head the next possible problem. If you run 'nodetool cleanup' on each node and some of your nodes still have more data then others, then it probably means your are writing the majority of data to a few

Re: balancing load

2011-01-17 Thread Edward Capriolo
On Mon, Jan 17, 2011 at 1:20 PM, Karl Hiramoto k...@hiramoto.org wrote: On 01/17/11 15:54, Edward Capriolo wrote: Just to head the next possible problem. If you run 'nodetool cleanup' on each node and some of your nodes still have more data then others, then it probably means your are writing

Re: changing the replication level on the fly

2011-01-18 Thread Edward Capriolo
On Tue, Jan 18, 2011 at 2:14 PM, Jeremy Stribling st...@nicira.com wrote: Hi, I've noticed in the new Cassandra 0.7.0 release that if I have a keyspace with a replication level of 2, but only one Cassandra node, I cannot insert anything into the system.  Likely this was a bug in the old

Re: please help with multiget

2011-01-18 Thread Edward Capriolo
On Tue, Jan 18, 2011 at 4:29 PM, Shu Zhang szh...@mediosystems.com wrote: Well, I don't think what I'm describing is complicated semantics. I think I've described general batch operation design and something that is symmetrical the batch_mutate method already on the Cassandra API. You are

Re: Cassandra on iSCSI?

2011-01-21 Thread Edward Capriolo
On Fri, Jan 21, 2011 at 12:07 PM, Jonathan Ellis jbel...@gmail.com wrote: On Fri, Jan 21, 2011 at 2:19 AM, Mick Semb Wever m...@apache.org wrote: Of course with a SAN you'd want RF=1 since it's replicating internally. Isn't this the same case for raid-5 as well? No, because the replication

Re: Lost MUTATIONS on several Cassandra nodes - no impact on the client

2011-01-23 Thread Edward Capriolo
On Sun, Jan 23, 2011 at 6:30 AM, ruslan usifov ruslan.usi...@gmail.com wrote: 2011/1/20 Jonathan Ellis jbel...@gmail.com It guarantees that if the requested ConsistencyLevel is not achieved, client will get a TimedOutException, which is a signal you need to add capacity to handle what you

Re: Lost MUTATIONS on several Cassandra nodes - no impact on the client

2011-01-23 Thread Edward Capriolo
On Sun, Jan 23, 2011 at 11:23 AM, ruslan usifov ruslan.usi...@gmail.com wrote: On Sun, Jan 23, 2011 at 6:30 AM, ruslan usifov ruslan.usi...@gmail.com wrote: Right. The difference is that the gossip process builds a topology of UP/DOWN hosts so Unavailable is thrown quickly. If you need ALL

Re: Cassandra + Puppet

2011-01-24 Thread Edward Capriolo
On Mon, Jan 24, 2011 at 5:17 PM, Nate McCall n...@riptano.com wrote: Might be a bit out of date, but this one is useful: https://github.com/cmceniry/cassandrapuppet On Mon, Jan 24, 2011 at 3:51 PM, Aaron Morton aa...@thelastpickle.com wrote: Is anyone using puppet http://www.puppetlabs.com/ 

Re: cassandra as session store

2011-02-01 Thread Edward Capriolo
On Tue, Feb 1, 2011 at 12:57 PM, Anthony John chirayit...@gmail.com wrote: Not a concern - and here is why:- From the wiki arch section captioned below - eventual consistency does not have to mean inconsistent reads. The concern is the overhead for consistent reads. But remember in the use

Re: How to delete bulk data from cassandra 0.6.3

2011-02-05 Thread Edward Capriolo
On Sat, Feb 5, 2011 at 4:12 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: Any update on this? On 02/05/2011 12:53 AM, Ali Ahsan wrote: So do we need to write a script ? or its some thing i can do as a system admin without involving and developer.If yes please guide me in this case.

Re: How to delete bulk data from cassandra 0.6.3

2011-02-05 Thread Edward Capriolo
On Sat, Feb 5, 2011 at 11:35 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote: Thanks for replying Edward Capriolo.Will this effect cassandra ring  integrity? Another question is that will cassandra work properly after this operation.And will it be possible to restore deleted  data from backup?.

Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread Edward Capriolo
On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote: Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected?

Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread Edward Capriolo
to it? Edward Capriolo wrote: On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote: Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly

Re: Finding the intersection results of column sets of two rows

2011-02-06 Thread Edward Capriolo
On Sun, Feb 6, 2011 at 10:15 AM, buddhasystem potek...@bnl.gov wrote: Hello, If the amount of data is _that_ small, you'll have a much easier life with MySQL, which supports the join procedure -- because that's exactly what you want to achieve. asil klin wrote: Hi all, I want to

Re: Cassandra memory consumption

2011-02-08 Thread Edward Capriolo
On Tue, Feb 8, 2011 at 4:56 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: I will do that in the future and I will post my results here ( I upgraded the server to debian 6 to see if there is any change, so memory is back to normal). I will report in a few days. In the meantime I am open

Re: Specifying row caching on per query basis ?

2011-02-09 Thread Edward Capriolo
On Wed, Feb 9, 2011 at 2:43 PM, Ertio Lew ertio...@gmail.com wrote: Is this under consideration for future releases ? or being thought about!? On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis jbel...@gmail.com wrote: Currently there is not. On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew

Re: Default Listen Port

2011-02-09 Thread Edward Capriolo
On Wed, Feb 9, 2011 at 4:00 PM, jeremy.truel...@barclayscapital.com wrote: What’s the easiest way to change the port nodes listen for comm on from other nodes? It appears that the default is 8080 which collides with my tomcat server on one of our dev boxes. I tried doing something in

Re: Is Avro still supported?

2011-02-12 Thread Edward Capriolo
https://issues.apache.org/jira/browse/CASSANDRA-926 On Sat, Feb 12, 2011 at 8:27 AM, Joshua Partogi joshua.j...@gmail.com wrote: Hi, I saw in the latest source in trunk, avro codes has been deleted. Does this mean Avro is not supported anymore? If so, what was the decision behind dropping

Re: Does Cassandra support multiple listen_address and rpc_address?

2011-02-13 Thread Edward Capriolo
On Sun, Feb 13, 2011 at 1:39 AM, Xiaobo Gu guxiaobo1...@gmail.com wrote: multiple network paths for inner-cluster communication will boost performance Thanks. Xiaobo Gu No. Each node has a single IP. You can boost performance in a similar way with Ethernet bonding, or 10G

Re: consistency question

2011-02-15 Thread Edward Capriolo
On Tue, Feb 15, 2011 at 3:59 AM, Serdar Irmak sir...@protel.com.tr wrote: Hi, In a 3 node named (named A,B,C) setup with replication factor 3 and quorum read/write scenario; suppose a new value of data X is written to A and B but not C with any reason, then A wend down and I fired D with

Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread Edward Capriolo
On Tue, Feb 15, 2011 at 3:03 PM, buddhasystem potek...@bnl.gov wrote: Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug fixed today). Would you still trust it as a production-level service? I'm just slightly concerned. I don't want to create a perception among our IT

Re: Replica details

2011-02-17 Thread Edward Capriolo
On Thu, Feb 17, 2011 at 1:41 PM, A J s5a...@gmail.com wrote: Where can I get good detailed explanation of the various replication options (Simple, Old Network and Network) along with snitches. I did read the definitive guide but not really satisfied. Is there a good post somewhere explaining

Re: Does servers with different capacities in a cluster affect the overall performance?

2011-02-22 Thread Edward Capriolo
On Tue, Feb 22, 2011 at 5:13 AM, XiaoboGu guxiaobo1...@gmail.com wrote: I mean servers with different CPU cores ,memory, or disk space, does Cassandra allow this kind of configuration? This is allowed but managing this may be more difficult in production. Most settings are applied globally at

Re: Distribution Factor: part of the solution to many-CF problem?

2011-02-22 Thread Edward Capriolo
On Mon, Feb 21, 2011 at 5:14 PM, David Boxenhorn da...@lookin2.com wrote: No, that's not what I mean at all. That message is about the ability to use different partitioners for different CFs, say, RandomPartitioner for one, OPP for another. I'm talking about defining how many nodes a CF

Re: Distribution Factor: part of the solution to many-CF problem?

2011-02-22 Thread Edward Capriolo
the  CF the key is storing in. Aaron On 23/02/2011, at 6:01 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Mon, Feb 21, 2011 at 5:14 PM, David Boxenhorn da...@lookin2.com wrote: No, that's not what I mean at all. That message is about the ability to use different partitioners

Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 2:30 PM, jeremy.truel...@barclayscapital.com wrote: Yeah I set the tokens, I’m more asking if I start the first seed node with autobootstrap set to false the second seed should have it set to true as well as all the slave nodes correct? I didn’t see this in the docs but

Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
to 'auto_bootstrap: false' in their .yaml file. -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Wednesday, February 23, 2011 2:36 PM To: user@cassandra.apache.org Cc: Truelove, Jeremy: IT (NYK) Subject: Re: Multiple Seeds On Wed, Feb 23, 2011 at 2:30 PM

Re: Multiple Seeds

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 3:28 PM, jeremy.truel...@barclayscapital.com wrote: So does cassandra monitor the config file for changes? If it doesn't how else would it know unless you restart you had added a new seed? -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com

Re: Will the large datafile size affect the performance?

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 4:51 PM, buddhasystem potek...@bnl.gov wrote: I know that theoretically it should not (apart from compaction issues), but maybe somebody has experience showing otherwise: My test cluster now has 250GB of data and will have 1.5TB in its reincarnation. If all these data

Re: New Chain for : Does Cassandra use vector clocks

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 9:28 PM, Ritesh Tijoriwala tijoriwala.rit...@gmail.com wrote: I was about to ask what Anthony's latest post below captures - if we don't have vector clocks and no locking, how does cassandra prevent/detect conflicts? This is somewhat related to the question I asked in

A simple script that creates multi node clusters on a single machine.

2011-02-23 Thread Edward Capriolo
On the mailing list and IRC there are many questions about Cassandra internals. I understand where the questions are coming from because it took me a while to get a grip on it. However if you have a laptop with a descent amount of RAM 2 GB is enough for 3-5 nodes, (4GB is better). You can kick up

Re: Fill disks more than 50%

2011-02-23 Thread Edward Capriolo
On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hi, Given that you have have always increasing key values (timestamps) and never delete and hardly ever overwrite data. If you want to minimize work on rebalancing and statically assign (new) token ranges to

Re: Fill disks more than 50%

2011-02-24 Thread Edward Capriolo
which will? remove all the unneeded keys? Thanks, Thibaut On Thu, Feb 24, 2011 at 4:22 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Wed, Feb 23, 2011 at 9:39 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: Hi, Given that you have have always increasing key values (timestamps

Re: New Chain for : Does Cassandra use vector clocks

2011-02-24 Thread Edward Capriolo
On Thu, Feb 24, 2011 at 3:03 PM, A J s5a...@gmail.com wrote: yes, that is difficult to digest and one has to be sure if the use case can afford it. Some other NOSQL databases deals with it differently (though I don't think any of them use atomic 2-phase commit). MongoDB for example will ask

Re: Understanding Indexes

2011-02-24 Thread Edward Capriolo
On Thu, Feb 24, 2011 at 3:34 PM, mcasandra mohitanch...@gmail.com wrote: I wasn't aware that there is an index on primary key (that is row keys). So from what I understand there is by default an index on for eg: , in below example? Where can I read more about it? UserProfile = { //

Re: Understanding Indexes

2011-02-24 Thread Edward Capriolo
On Thu, Feb 24, 2011 at 3:55 PM, mcasandra mohitanch...@gmail.com wrote: Either I am not explaning properly or I don't understand the data model just yet. Please check again: In below example this is what I understand: 1) UserProfile is a CF 2) is a row key 3) username is a column.

Re: New Chain for : Does Cassandra use vector clocks

2011-02-24 Thread Edward Capriolo
On Thu, Feb 24, 2011 at 3:56 PM, A J s5a...@gmail.com wrote: While we are at it, there's more to consider than just CAP in distributed :) http://voltdb.com/blog/clarifications-cap-theorem-and-data-related-errors On Thu, Feb 24, 2011 at 3:31 PM, Edward Capriolo edlinuxg...@gmail.com wrote

Re: Fill disks more than 50%

2011-02-25 Thread Edward Capriolo
On Fri, Feb 25, 2011 at 7:38 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: @Thibaut Britz Caveat:Using simple strategy. This works because cassandra scans data at startup and then serves what it finds. For a join for example you can rsync all the data from the node below/to the right

Re: Storing photos, images, docs etc.

2011-03-01 Thread Edward Capriolo
On Tue, Mar 1, 2011 at 1:43 PM, mcasandra mohitanch...@gmail.com wrote: Is it advisable or ok to store photos, images and docs in cassandra where you expect high volume of uploads and views? I was reading about facebook implementation of haystack to store the photos. They don't put anything

Re: Storing photos, images, docs etc.

2011-03-03 Thread Edward Capriolo
On Thu, Mar 3, 2011 at 2:49 PM, mcasandra mohitanch...@gmail.com wrote: Has anyone heard about lustre distributed file system? I am wondering if it will work well where keep the metadata in Cassandra and images in Lustre. I looked at MogileFS but not too sure about it's support. -- View

Re: Poor performance on small data set

2011-03-11 Thread Edward Capriolo
On Fri, Mar 11, 2011 at 11:44 AM, Peter Schuller peter.schul...@infidyne.com wrote: There is less than 1000 rows and i've got a 75-100ms to get one row by id With memcached it's 2ms I don't know where is the problem. jvm ? cassandra ? phpcassa ? What can i do to detect where is the

Re: Is column update column-atomic or row atomic?

2011-03-15 Thread Edward Capriolo
On Tue, Mar 15, 2011 at 5:46 PM, buddhasystem potek...@bnl.gov wrote: Sorry for the rather primitive question, but it's not clear to me if I need to fetch the whole row, add a column as a dictionary entry and re-insert it if I want to expand the row by one column. Help will be appreciated.

Re: Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread Edward Capriolo
On Wed, Mar 16, 2011 at 9:58 PM, buddhasystem potek...@bnl.gov wrote: Dear All, this is from my new Cassandra server. It obviously uses hyperthreading, I just don't know how to translate this to concurrent readers and writers in cassandra.yaml -- can somebody take a look and tell me what

Re: Replacing a dead seed

2011-03-17 Thread Edward Capriolo
On Thu, Mar 17, 2011 at 9:09 AM, Jonathan Colby jonathan.co...@gmail.com wrote: Hi - If a seed crashes (i.e., suddenly unavailable due to HW problem),   what is the best way to replace the seed in the cluster? I've read that you should not bootstrap a seed.  Therefore I came up with this

Re: Optimizing a few nodes to handle all client connections?

2011-03-19 Thread Edward Capriolo
On Fri, Mar 18, 2011 at 9:55 PM, Jason Harvey alie...@gmail.com wrote: Hola everyone, I have been considering making a few nodes only manage 1 token and entirely dedicating them to talking to clients. My reasoning behind this is I don't like the idea of a node having a dual-duty of handling

Re: Working backwards from production to staging/dev

2011-03-26 Thread Edward Capriolo
On Fri, Mar 25, 2011 at 2:11 PM, ian douglas i...@armorgames.com wrote: On 03/25/2011 10:12 AM, Jonathan Ellis wrote: On Fri, Mar 25, 2011 at 11:59 AM, ian douglasi...@armorgames.com  wrote: (we're running v0.60) I don't know if you could hear that from where you are, but our whole office

Re: Starter GUI Tool for Windows

2011-03-26 Thread Edward Capriolo
I don't know. Apache web server is a patchy web server, but crapsandra just no way to put that in a good light. On Friday, March 25, 2011, Dario Bravo darbr...@gmail.com wrote: People: Crapssandra. I'm starting a Cassandra project and starting to learn about this beautiful Cassandra, so I

Re: Starter GUI Tool for Windows

2011-03-27 Thread Edward Capriolo
info on selected nodes... Tomorrow I'll be adding a bunch of new features, I hope. 2011/3/26 Edward Capriolo edlinuxg...@gmail.com I don't know. Apache web server is a patchy web server, but crapsandra just no way to put that in a good light. On Friday, March 25, 2011, Dario Bravo darbr

Re: International language implementations

2011-03-29 Thread Edward Capriolo
On Tue, Mar 29, 2011 at 5:54 PM, A J s5a...@gmail.com wrote: Example, taobao.com is a chinese online bid site. All data is chinese and they use Mongodb successfully. Are there similar installations of cassandra where data is non-latin ? I know in theory, it should all work as cassandra has

Re: How to determine if repair need to be run

2011-03-30 Thread Edward Capriolo
On Wed, Mar 30, 2011 at 12:54 PM, Peter Schuller peter.schul...@infidyne.com wrote: Note this script doesn't work if your repair takes hours, and in the middle of the repair cassandra was restarted, nodetool will exit and the flagfile will be updated.   Another case, if repair hangs, and day

Re: Two column families or One super column family?

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 3:52 AM, T Akhayo t.akh...@gmail.com wrote: Hi Aaron, Thank you for your reply, i appreciate the suggestions you made. Yesterday i managed to get everything (our main read) in one CF, with the use of a structure in a value like you suggested. Designing a new data

Re: Not able to set ZERO consistency level

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 2:53 PM, Peter Schuller peter.schul...@infidyne.com wrote: Only the following Levels are provided, I am wondering if the ZERO consistency level is removed in Cassandra 0.7.X ? Yes, it's gone. If so, Could you please explain why was it removed and what is the best

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 6:15 PM, Eric Gilmore e...@datastax.com wrote: A script that I have says the following: $ python ctokens.py How many nodes are in your cluster? 2 node 0: 0 node 1: 85070591730234615865843651857942052864 The first token should be zero, for the reasons discussed here:

Re: Ditching Cassandra

2011-03-31 Thread Edward Capriolo
Gregori, Congrats on writing the fud-liest post of the month award. Firstly if you don't like updates give up on computers and software. Especally give up on anything that has to do with nosql because it is fast evolving. If you think you have a problem with the cassandra api, then what you

Re: nodetool cfstathistogram error

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 8:25 PM, mcasandra mohitanch...@gmail.com wrote: It looks like if I use system schema it fails. Is it because of LocalPartitioner? I ran with other keyspace and got following output. Offset SSTables Write Latency Read Latency Row Size Column Count 1 0 0 0 0 0 2 0 0

Re: Node added, no performance boost -- are the tokens correct?

2011-04-01 Thread Edward Capriolo
On Fri, Apr 1, 2011 at 1:15 PM, Peter Schuller peter.schul...@infidyne.com wrote: Now, I moved the tokens. I still observe that read latency deteriorated with 3 machines vs original one. Replication factor is 1, Cassandra version 0.7.2 (didn't have time to upgrade as I need results by this

Re: Bizarre side-effect of increasing read concurrency

2011-04-01 Thread Edward Capriolo
On Fri, Apr 1, 2011 at 11:27 PM, Jason Harvey alie...@gmail.com wrote: On further analysis, it looks like this behavior occurs when a node is simply restarted. Is that normal behavior? If mark-and-sweep becomes less and less effective over time, does that suggest an issue with GC, or an issue

Re: Endless minor compactions after heavy inserts

2011-04-03 Thread Edward Capriolo
On Sun, Apr 3, 2011 at 1:46 PM, Sheng Chen chensheng2...@gmail.com wrote: I think if i can keep a single sstable file in a proper size, the hot data/index files may be able to fit into memory at least in some occasions. In my use case, I want to use cassandra for storage of a large amount of

Re: Embedding Cassandra in Java code w/o using ports

2011-04-04 Thread Edward Capriolo
On Mon, Apr 4, 2011 at 8:29 AM, aaron morton aa...@thelastpickle.com wrote: I'm interested to know more about the problems using the CLI. Aaron. On 2 Apr 2011, at 15:07, Bob Futrelle wrote: Connecting via CLI to local host with a port number has never been successful for me in Snow

Re: selecting random columns ..

2011-04-08 Thread Edward Capriolo
On Fri, Apr 8, 2011 at 4:48 AM, Sasha Dolgy sdo...@gmail.com wrote: hi all, is there a way to select random columns from a key? -- Sasha Dolgy sasha.do...@gmail.com getRangeSlice with random column start key.

Re: database design

2011-04-13 Thread Edward Capriolo
On Wed, Apr 13, 2011 at 10:39 AM, Jean-Yves LEBLEU jleb...@gmail.com wrote: Hi all, Just some thoughts and question I have about cassandra data modeling. If I understand well, cassandra is better on writing than on reading. So you have to think about your queries to design cassandra schema.

Re: Quick Poll: Server names

2010-07-27 Thread Edward Capriolo
On Tue, Jul 27, 2010 at 11:49 AM, uncle mantis uncleman...@gmail.com wrote: Ah S**T! The Pooh server is is down again! =) What does one do if they run out of themed names? Regards, Michael On Tue, Jul 27, 2010 at 10:46 AM, Brett Thomas brettptho...@gmail.com wrote: I like names of

Re: how to recover cassandra data

2010-08-02 Thread Edward Capriolo
On Mon, Aug 2, 2010 at 9:11 AM, john xie shanfengg...@gmail.com wrote: ReplicationFactor = 3 one day i stop 192.168.1.147 and remove cassandra data by mistake, can i recover  192.168.1.147's cassadra data by restart cassandra ?    DataFileDirectories         

Re: unable to start cassandra

2010-08-03 Thread Edward Capriolo
On Tue, Aug 3, 2010 at 10:47 AM, Maciej Lisowski m.lisow...@powerprice.pl wrote: Hi all, I’m new here and new with Cassandra and I’ve got problem to run it (v. 0.6.4) with jdk1.6.0_21. When I type “cassandra” to run it I get error: ERROR 16:23:53,803 Uncaught exception in thread

Re: unable to start cassandra

2010-08-03 Thread Edward Capriolo
On Tue, Aug 3, 2010 at 11:44 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Tue, Aug 3, 2010 at 10:47 AM, Maciej Lisowski m.lisow...@powerprice.pl wrote: Hi all, I’m new here and new with Cassandra and I’ve got problem to run it (v. 0.6.4) with jdk1.6.0_21. When I type “cassandra

Growing commit log directory.

2010-08-09 Thread Edward Capriolo
I have a 16 node 6.3 cluster and two nodes from my cluster are giving me major headaches. 10.71.71.56 Up 58.19 GB 10827166220211678382926910108067277| ^ 10.71.71.61 Down 67.77 GB 123739042516704895804863493611552076888v | 10.71.71.66 Up 43.51 GB

Re: Growing commit log directory.

2010-08-09 Thread Edward Capriolo
On Mon, Aug 9, 2010 at 8:20 PM, Jonathan Ellis jbel...@gmail.com wrote: what does tpstats or other JMX monitoring of the o.a.c.concurrent stages show? On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I have a 16 node 6.3 cluster and two nodes from my cluster

a plea not to remove rowsize warning

2010-08-11 Thread Edward Capriolo
Hello all, I recently posted on list about a situation where two of my nodes from my 16 node were garbage collecting and at ooming. I was able to move my xmx from 9gb to 11gb to see that rather then my memory saw tooth. I would saw tooth around 4 gb before memory shot up like a rocket. After

Re: indexing rows ordered by int

2010-08-15 Thread Edward Capriolo
On Sunday, August 15, 2010, S Ahmed sahmed1...@gmail.com wrote: For CF that I need to perform range scans on, I create separate CF that have custom ordering. Say a CF holds comments on a story (like comments on a reddit or digg story post) So if I need to order comments by votes, it seems I

Hive Storage Handler for Cassandra

2010-08-16 Thread Edward Capriolo
Hello, Anyone interested in doing map/reduce on Cassandra data should take a look at Cassandra Storage Handler for Hive. Storage handlers give Hive the ability to work with data outside HDFS in a more natural way. Support is now in place for reading and writing to/from Standard Column Families

Re: cache sizes using percentages

2010-08-17 Thread Edward Capriolo
On Tue, Aug 17, 2010 at 1:55 PM, Artie Copeland yeslinux@gmail.com wrote: if i set a key cache size of 100% the way i understand how that works is: - the cache is not write through, but read through - a key gets added to the cache on the first read if not already available - the size of

<    1   2   3   4   5   6   7   8   >