Re: Performance deterioration while building secondary index

2011-09-16 Thread buddhasystem
Well, the problem is still there, i.e. I tried to add one more index and the 3-node cluster is just going spastic, becomes unresponsive etc. These boxes have plenty of CPU and memory. -- View this message in context:

Re: Node added, no performance boost -- are the tokens correct?

2011-04-01 Thread buddhasystem
:) On Thu, Mar 31, 2011 at 3:06 PM, buddhasystem lt;potek...@bnl.govgt; wrote: I just configured a cluster of two nodes -- do these token values make sense? The reason I'm asking that so far I don't see load balancing to be happening, judging from performance. Address Status

Netstats out of sync?

2011-03-31 Thread buddhasystem
I'm rebalancing a cluster of 2 nodes at this point. Netstats on the source node reports progress of the stream, whereas on the receving end netstats states that progress = 0. Did anyone see that? Do I need both nodes listed as seeds in cassandra.yaml? TIA/ -- View this message in context:

Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
I just configured a cluster of two nodes -- do these token values make sense? The reason I'm asking that so far I don't see load balancing to be happening, judging from performance. Address Status State LoadOwnsToken

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
Yup, I screwed up the token setting, my bad. Now, I moved the tokens. I still observe that read latency deteriorated with 3 machines vs original one. Replication factor is 1, Cassandra version 0.7.2 (didn't have time to upgrade as I need results by this weekend). Key and row caching was disabled

Re: data aggregation in Cassandra

2011-03-25 Thread buddhasystem
Hello Saurabh, I have a similar situation, with a more complex data model, and I do an equivalent of map-reduce by hand. The redeeming value is that you have complete freedom in how you hash, and you design the way you store indexes and similar structures. If there is a pattern in data store, you

Re: cassandra nodes with mixed hard disk sizes

2011-03-22 Thread buddhasystem
aaron morton wrote: Also a node is be responsible for storing it's token range and acting as a replica for other token ranges. So reducing the token range may not have a dramatic affect on the storage requirements. Aaron, is there a way to configure wimpy nodes such that the replicas

Re: Deleting old SSTables

2011-03-22 Thread buddhasystem
Jonathan, for all of us just tinker with test clusters, building confidence in the product, it would be nice to be able to do same with nodetool, without jconsole, just my 0.5 penny. Thanks. Jonathan Ellis-3 wrote: From the next paragraph of the same wiki page: SSTables that are

Re: 0.7.2 choking on a 5 MB column

2011-03-22 Thread buddhasystem
Jonathan, wide rows have been discussed. I thought that the limit on number of columns is way bigger than 45k. What can one expect in reality? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/0-7-2-choking-on-a-5-MB-column-tp6198387p6198548.html

Re: 0.7.2 choking on a 5 MB column

2011-03-22 Thread buddhasystem
I see. I'm doing something even more drastic then, because I'm only inserting one row in this case, and just use cf.insert(), without batch mutator. It didn't occur to me that was a bad idea. So I take it, this method will fail. Hmm. -- View this message in context:

Re: Reading whole row vs a range of columns (pycassa)

2011-03-20 Thread buddhasystem
understand your reference to the OOP in the context of a reading 100 columns from a row. Aaron On 19 Mar 2011, at 16:22, buddhasystem wrote: gt; As I'm working on this further, I want to understand this: gt; gt; Is it advantageous to flatten data in blocks (strings) each

Undead rows after nodetool compact

2011-03-18 Thread buddhasystem
This has been discussed once, but I don't remember the outcome. I insert a row and then delete the key immediately. I then run nodetool compact. In cassanra-cli, list cf still return 1 empty row. This is not a showstopper but damn unpretty. Is there a way to make deleted rows go, immediately? --

Reading whole row vs a range of columns (pycassa)

2011-03-18 Thread buddhasystem
Is there is noticeable difference in speed between reading the whole row through Pycassa, vs a range of columns? Both rows and columns are pretty slim. -- View this message in context:

Re: Reading whole row vs a range of columns (pycassa)

2011-03-18 Thread buddhasystem
As I'm working on this further, I want to understand this: Is it advantageous to flatten data in blocks (strings) each containing a series of objects, if I know that a serial object read is often likely, but don't want to resort to OPP? I worked out the optimal granularity, it seems. Is it better

Does concurrent_reads relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Hello, in the instructions, I need to link concurrent_reads to number of drives. Is this related to number of physical drives that I have in my RAID0, or something else? -- View this message in context:

Re: Does concurrent_reads relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks to all for replying, but frankly I didn't get the answer I wanted. Does the number of disks apply to number of spindles in RAID0? Or something else like a separate disk for commitlog and for data? -- View this message in context:

Re: Does concurrent_reads relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks Peter, I can see it better now. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183051.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at

Re: Does concurrent_reads relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Where and how do I choose it? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183069.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread buddhasystem
Dear All, this is from my new Cassandra server. It obviously uses hyperthreading, I just don't know how to translate this to concurrent readers and writers in cassandra.yaml -- can somebody take a look and tell me what number of cores I need to assume for concurrent_reads and concurrent_writes. Is

Re: Is column update column-atomic or row atomic?

2011-03-16 Thread buddhasystem
Hello Peter, thanks for the note. I'm not looking for anything fancy. It's just when I'm looking at the following bit of Pycassa docs, it's not 100% clear to me that it won't overwrite the entire row for the key, if I want to simply add an extra column {'foo':'bar'} to the already existing row. I

Re: Is column update column-atomic or row atomic?

2011-03-16 Thread buddhasystem
Thanks for clarification, Tyler, sorry again for the basic question. I've been doing straight inserts from Oracle so far but now I need to update rows with new columns. -- View this message in context:

Re: Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread buddhasystem
Thanks! Docs say it's good to set it to 8*Ncores, are saying you see 8 cores in this output? I know I need to go way above default 32 with this setup. -- View this message in context:

Is column update column-atomic or row atomic?

2011-03-15 Thread buddhasystem
Sorry for the rather primitive question, but it's not clear to me if I need to fetch the whole row, add a column as a dictionary entry and re-insert it if I want to expand the row by one column. Help will be appreciated. -- View this message in context:

Re: Is column update column-atomic or row atomic?

2011-03-15 Thread buddhasystem
Thanks. Can you give me a pycassa example, if possible? Thanks! -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6174487.html Sent from the cassandra-u...@incubator.apache.org mailing list

Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread buddhasystem
Tyler, as a collateral issue - I've been wondering for a while what advantage if any it buys me, if I declare a value 'long' (which it roughly is) as opposed to passing around strings. String is flattened onto a replica of itself, I assume? No conversion? Maybe it even means better speed. Thanks,

null vs value not found?

2011-02-24 Thread buddhasystem
I'm doing insertion with a pycassa client. It seems to work in most cases, but sometimes, when I go to Cassandra-cli, and query with key and column that I inserted, I get null whereas I shouldn't. What could be causes for that? -- View this message in context:

Re: null vs value not found?

2011-02-24 Thread buddhasystem
thresholds: 4/32 Read repair chance: 1.0 Built indexes: [] I pretty much went with the default settings, and the column name is 'CATALOG'. Maxim Tyler Hobbs-2 wrote: On Thu, Feb 24, 2011 at 2:27 PM, buddhasystem potek...@bnl.gov wrote: I'm doing insertion with a pycassa

Re: null vs value not found?

2011-02-24 Thread buddhasystem
Thanks! You are right. I see exception but have no idea what went wrong. ERROR [ReadStage:14] 2011-02-24 21:51:29,374 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[ReadStage:14,5,main] java.io.IOError: java.io.EOFException at

Re: Homebrew CF-indexing vs secondary indexing

2011-02-24 Thread buddhasystem
FWIW, for me the advantage of homebrew indexes is that they can be a lot more sophisticated than the standard -- I can hash combinations of column values to whatever I want. I also put counters on column values in the index, so there is lots of functionality. Of course, I can do it because my

Will the large datafile size affect the performance?

2011-02-23 Thread buddhasystem
I know that theoretically it should not (apart from compaction issues), but maybe somebody has experience showing otherwise: My test cluster now has 250GB of data and will have 1.5TB in its reincarnation. If all these data is in a single CF -- will it cause read or write performance problems?

Can I count on Super Column Families why planing 3 years out?

2011-02-23 Thread buddhasystem
There was a discussion here on how well (or not so well) the Super CFs are supported. I now need to make a strategic decision as to how I plan my data. What's the consensus -- will the super CF be there 3 years out? TIA Maxim -- View this message in context:

How come key cache increases speed by x4?

2011-02-23 Thread buddhasystem
Well I know the cache is there for a reason, I just can't explain the factor of 4 when I run my queries on a hot vs cold cache. My queries are actually a chain of one on an inverted index, which produces a tuple of keys to be used in the main query. The inverted index query should be downright

Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem
I've been too smart for my own good trying to type columns, on the theory that it would later increase performance by having more efficient comparators in place. So if a string represents an integer, I would convert it to an integer and declare the column as such. Same for LONG. What I found is

Re: Virtues and pitfall of using TYPES?

2011-02-18 Thread buddhasystem
is pretty close (just a size check). If you meant that the conversion is killing performance on your client, you should switch to a more performant client language. :) On Fri, Feb 18, 2011 at 9:56 PM, buddhasystem potek...@bnl.gov wrote: I've been too smart for my own good trying to type

Re: create additional secondary index

2011-02-16 Thread buddhasystem
I sidestep this problem by using a Python script (pycassa-based) where I configure my CFs. This way, it's reproducible and documented. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/create-additional-secondary-index-tp6033574p6033683.html Sent

What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem
Hello, we are acquiring new hardware for our cluster and will be installing it soon. It's likely that I won't need to rely on secondary index functionality, as data will be write-once read-many and I can get away with inverse index creation at load time, plus I have some more complex indexing in

Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem
Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug fixed today). Would you still trust it as a production-level service? I'm just slightly concerned. I don't want to create a perception among our IT that the product is not ready for prime time. -- View this message in

Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem
Thank you Attila! We will indeed have a few months of breaking in. I suppose I'll keep my fingers crossed and see that 0.7.X is very stable. So I'll deploy 0.7.1 -- I will need to apply all the patches, there is no cumulative download, is that correct? Attila Babo wrote: 0.6.8 is stable and

Re: Column name size

2011-02-11 Thread buddhasystem
I've been thinking about this as well. I'm migrating data from a large Oracle database, and the RDBMS columns names are descriptive (good) and long (bad). For now I just keep them when populating Cassandra, but I can shave off about 30% of storage by hashing names. I don't need any automation and

Re: Limit on amount of CFs

2011-02-11 Thread buddhasystem
I asked a similar question (but didn't receive an answer). I'm trying to see if a large number of CFs might be beneficial. One thing I can think about is the size of extra storage needed for compaction -- obviously it will be smaller in case of many smaller CFs. -- View this message in context:

Re: Calculating the size of rows in KBs

2011-02-11 Thread buddhasystem
Does it also mean that the whole row will be deserialized when a query comes just for one column? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Calculating-the-size-of-rows-in-KBs-tp6011243p6017870.html Sent from the

Re: Specifying row caching on per query basis ?

2011-02-09 Thread buddhasystem
Jonathan, what if the data is really homogeneous, but over a long period of time. I decided that the users who hit the database for recent past should have a better ride. Splitting into a separate CF also has costs, right? In fact, if I were to go this way, do you think I can crank down the key

What will happen if I try to compact with insufficient headroom?

2011-02-09 Thread buddhasystem
One of my nodes is 76% full. I know that one of CFs represents 90% of the data, others are really minor. Can I still compact under these conditions? Will it crash and lose the data? Will it try to create one very large file out of fragments, for that dominating CF? TIA -- View this message in

Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem
Seeing that discussion here about indexes not supported in superCFs, and less than clear future of superCFs altogether, I was thinking about getting a modicum of same functionality with serialized objects inside columns. This way the column key becomes sort of analog of supercolumn key, and I

Re: Can serialized objects in columns serve as ersatz superCFs?

2011-02-08 Thread buddhasystem
Thanks for the comment! In my case, I want to store various time slices as indexes, so the content can be serialized as comma-separated concatenation of unique object IDs. Example: on 20101204, multiple clouds experienced a variety of errors in job execution. In addition, multiple users ran (or

Java bombs during compaction, please help

2011-02-07 Thread buddhasystem
Hello, one node in my 3-machine cluster cannot perform compaction. I tried multiple times, it ran out of heap space once and I increased it. Now I'm getting the dump below (after it does run for a few minutes). I hope somebody can shed a little light on what' going on, because I'm at a loss and

Re: Java bombs during compaction, please help

2011-02-07 Thread buddhasystem
Thanks Jonathan -- does it mean that the machine is experiencing IO problems? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Java-bombs-during-compaction-please-help-tp6001773p6002320.html Sent from the cassandra-u...@incubator.apache.org

Re: Finding the intersection results of column sets of two rows

2011-02-06 Thread buddhasystem
Hello, If the amount of data is _that_ small, you'll have a much easier life with MySQL, which supports the join procedure -- because that's exactly what you want to achieve. asil klin wrote: Hi all, I want to procure the intersection of columns set of two rows (from 2 different column

How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem
Just wanted to see if someone with experience in running an actual service can advise me: how often do you run nodetool compact on your nodes? Do you stagger it in time, for each node? How badly is performance affected? I know this all seems too generic but then again no two clusters are

Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem
Thanks Edward. In our usage scenario, there is never downtime, it's a global 24/7 operation. What is impacted the worst, the read or write? How does a node handle compaction when there is a spike of writes coming to it? Edward Capriolo wrote: On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem

Re: order of index expressions

2011-02-05 Thread buddhasystem
Jonathan, what's the implementation of that? I.e. is is a product of indexes or nested loops? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/order-of-index-expressions-tp5995909p5996488.html Sent from the

Re: Using Cassandra to store files

2011-02-04 Thread buddhasystem
Even when storage is in NFS, Cassandra can still be quite useful as a file catalog. Your physical storage can change, move etc. Therefore, it's a good idea to provide mapping of logical names to physical store points (which in fact can be many). This is a standard technique used in mass storage.

Re: Moving data

2011-02-04 Thread buddhasystem
FWIW, I'm working on migrating a large amount of data out of Oracle into my test cluster. The data has been warehoused as CSV files on Amazon S3. Having that in place allows me to not put extra load on the production service when doing many repeated tests. I then parse the data using CSV Python

Re: Using Cassandra to store files

2011-02-03 Thread buddhasystem
CouchDB -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Slow network writes

2011-02-03 Thread buddhasystem
Dude, are you asking me to unsubscribe? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5991488.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Commit log compaction

2011-02-02 Thread buddhasystem
How often and by what criteria is the commit log compacted/truncated? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Commit-log-compaction-tp5985221p5985221.html Sent from the cassandra-u...@incubator.apache.org mailing list

Re: Commit log compaction

2011-02-02 Thread buddhasystem
Thank you. So what is exactly the condition that causes the older commit log files to actually be removed? I observe that indeed they are rotated out when the threshold is reached, but then new ones a placed in the directory and the older ones are still there. Thanks, Maxim -- View this

Re: Counters in 0.8 -- conditional?

2011-02-02 Thread buddhasystem
Thanks. Just wanted to note that counting the number of rows where foo=bar is a fairly ubiquitous task in db applications. In case of big data, trafficking all these data to client just to count something isn't optimal at all. Maxim -- View this message in context:

Re: Counters in 0.8 -- conditional?

2011-02-02 Thread buddhasystem
Thanks. Yes I know it's by no means trivial. I thought in case there was an index on the column on which I want to place condition, the index machinery itself can do the counting (i.e. when the index is updated, the counter is incremented). It doesn't seem too orthogonal to the current

Re: Cassandra memory needs

2011-02-02 Thread buddhasystem
Oleg, I just wanted to add that I confirmed the importance of that rule of thumb the hard way. I created two extra CFs and was able to reliably crash the nodes during writes. I guess for the final setting I'll rely on results of my testing. But it's also important to not cause the swap death of

How do I get 0.7.1?

2011-02-02 Thread buddhasystem
Thanks. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5986927.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Slow network writes

2011-02-02 Thread buddhasystem
Jonathan, where do I find that contrib/stress? Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Slow-network-writes-tp5985757p5986937.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: How do I get 0.7.1?

2011-02-02 Thread buddhasystem
Stephen, sorry I didn't understand your missive. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-do-I-get-0-7-1-tp5986927p5987184.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: cassandra as session store

2011-02-01 Thread buddhasystem
Most if not all modern web application frameworks support sessions. This applies to Django (with which I have most experience and also run it with X.509 security layer) but also to Ruby on Rails and Pylons. So, why would you re-invent the wheel? Too messy. It's all out there for you to use.

Re: cassandra as session store

2011-02-01 Thread buddhasystem
For completeness: http://stackoverflow.com/questions/3746685/running-django-site-in-multiserver-environment-how-to-handle-sessions http://docs.djangoproject.com/en/dev/topics/http/sessions/#using-cached-sessions I guess your approach does make sense, one only wishes that the servlet in question

TSocket timing out

2011-01-29 Thread buddhasystem
When I do a lot of inserts into my cluster (10k at a time) I get timeouts from Thrift, the TScoket.py module. What do I do? Thanks, Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/TSocket-timing-out-tp5973548p5973548.html Sent from the

Re: Cassandra and count

2011-01-28 Thread buddhasystem
As far as I know, there are no aggregate operations built into Cassandra, which means you'll have to retrieve all of the data to count it in the client. I had a thread on this topic 2 weeks ago. It's pretty bad. -- View this message in context:

Re: Node going down when streaming data, what next?

2011-01-28 Thread buddhasystem
Sorry Aaron but this doesn't help. As I said, machine is dead, kaput, finished. So I can't do decommission. I can remove token to any other node, but -- the dead machine is going to hang around in my ring reports like a zombie. -- View this message in context:

Re: Node going down when streaming data, what next?

2011-01-28 Thread buddhasystem
It does remove tokens, and the ring shows that the problematic node owns 0 tokens, which is OK. However, it's still there, listed. It's not a bug but kind of like a feature -- you can move that node back in two days later and move tokens in same or different way. What I wish happened was that

Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem
Will it work for a billion rows? Because that's where eventually I'll end up being. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-for-storing-large-objects-tp5965418p5966284.html Sent from the

Re: Using Cassandra for storing large objects

2011-01-27 Thread buddhasystem
I would ask myself a different question, which is what media-hosting sites use (YouTube and all others). Cassandra still may have its usefulness here as a mapper between a logical id and physical file location. -- View this message in context:

Re: Node going down when streaming data, what next?

2011-01-27 Thread buddhasystem
OK, after running repair and waiting overnight the rebalancing worked and now 3 nodes share the load as I expected. However, one node that is broken is still listed in the ring. I have no intention of reviving it. What's the optimal way to get rid of it as far as the ring configuration is

Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
I was moving a node and at some point it started streaming data to 2 other nodes. Later, that node keeled over and let's assume I can't fix it for the next 3 days and just want to move tokens on the remaining three to even out and see if I can live with it. But I can't do that! The node that was

Re: Schema Design

2011-01-26 Thread buddhasystem
Having separate columns for Year, Month etc seems redundant. It's tons more efficient to keep say UTC time in POSIX format (basically integer). It's easy to convert back and forth. If you want to get a range of dates, in that case you might use Order Preserving Partitioner, and sort out which

Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
Bump. I still don't know what is the best things to do, plz help. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Node-going-down-when-streaming-data-what-next-tp5962944p5964231.html Sent from the cassandra-u...@incubator.apache.org mailing list

Re: Schema Design

2011-01-26 Thread buddhasystem
I used the term sharding a bit frivolously. Sorry. It's just splitting semantically homogenious data among CFs doesn't scale too well, as each CF is allocated a piece of memory on the server. -- View this message in context:

Re: Node going down when streaming data, what next?

2011-01-26 Thread buddhasystem
Hello, from what I know, you don't really have to restart simultaneously, although of course you don't want to wait. I finally decided to use removetoken command to actually scratch out the sickly node from the cluster. I'll bootstrap is later when it's fixed. -- View this message in

Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem
Sorry if this sounds silly, but I can't get my brain around this one: if all nodes contain replicas, why does the cluster stream data every time I more or remove a token? If the data is already there, what needs to be streamed? Thanks Maxim -- View this message in context:

RE: Why does cassandra stream data when moving tokens?

2011-01-26 Thread buddhasystem
Thanks, I'll look at the configuration again. In the meantime, I can't move the first node in the ring (after I removed the previous node's token) -- it throws an exception and says data is being streamed to it -- however, this is not what netstats says! Weirdness continues... Maxim -- View

Re: Forcing GC w/o jconsole

2011-01-25 Thread buddhasystem
Thanks! It doesn't seem to have any effect on GCing dropped CFs, though. Maxim -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Forcing-GC-w-o-jconsole-tp5956747p5960100.html Sent from the cassandra-u...@incubator.apache.org mailing list

Re: Stress test inconsistencies

2011-01-25 Thread buddhasystem
Oleg, I'm a novice at this, but for what it's worth I can't imagine you can have a _sustained_ 1kHz insertion rate on a single machine which also does some reads. If I'm wrong, I'll be glad to learn that I was. It just doesn't seem to square with a typical seek time on a hard drive. Maxim --

Re-partitioning the cluster with nodetool: what's happening?

2011-01-25 Thread buddhasystem
I'm trying re-partition my 4-node cluster to make the load exactly 25% on each node. As per recipes found in documentation, I calculate: for x in xrange(4): ... print 2**127/4*x ... 0 42535295865117307932921825928971026432 85070591730234615865843651857942052864

Re: Re-partitioning the cluster with nodetool: what's happening?

2011-01-25 Thread buddhasystem
Correction -- what I meant to say that I do see announcements about streaming in the output, but these are stuck at 0%. -- View this message in context:

Forcing GC w/o jconsole

2011-01-24 Thread buddhasystem
My situation is similar to one described at this link: http://stackoverflow.com/questions/4155696/how-to-trigger-manual-java-gc-from-linux-console-with-no-x11 I'm trying the following command but it fails (connection refused) java -jar cmdline-jmxclient-0.10.3.jar - localhost:8081

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem
OK, so I'm looking at this page: http://wiki.apache.org/cassandra/MemtableSSTable This looks promising: A compaction marker is also added to obsolete sstables so they can be deleted on startup if the server does not perform a GC before being restarted. So it would seem that if I restart the

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem
Thanks for the note, yes, I do know what files I don't need anymore. And, I do realize the difference between grace period of CFs, and garbage collection (or at least I hope I do). On the face value, documentation wasn't precise enough about JVM GC taking care of dropped CFs. I understand this

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-24 Thread buddhasystem
Thanks Aaron. As I remarked earlier (and it seems it not uncommon) none of the nodes have X11 installed (I think I could arrange this, but it's a bit of a hassle). So if I understand correctly, jconsole is a X11 app, and I'm out of luck with that. I would agree with you that having a proper

Multiple indexes - how does Cassandra handle these internally?

2011-01-21 Thread buddhasystem
Greetings -- if I use multiple secondary indexes in the query, what will Cassandra do? Some examples say it will index on first EQ and then loop on others. Does it ever do a proper index product to avoid inner loops? Thanks Maxim -- View this message in context:

Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread buddhasystem
Greetings, I just used teh nodetool to force a major compaction on my cluster. It seems like the cfs currently in service were indeed compacted, while the old test materials (which I dropped from CLI) were still there as tombstones. Is that the expected behavior? Hmm... TIA. -- View this

Re: Does Major Compaction work on dropped CFs? Doesn't seem so.

2011-01-20 Thread buddhasystem
Thanks! What's strange anyhow is that the GC period for these cfs expired some days ago. I thought that a compaction would take care of these tombstones. I used nodetool to compact. -- View this message in context: