Re: Node forgets about most of its column families

2012-08-28 Thread Peter Schuller
to disablegossip and make other nodes not send requests to it. disabling thrift would also be advised, or even firewalling it prior to restart. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JMX(RMI) dynamic port allocation problem still exists?

2012-08-29 Thread Peter Schuller
I can recommend Jolokia highly for providing an HTTP/JSON interface to JMX (it can be trivially run in agent mode by just altering JVM args): http://www.jolokia.org/ -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Memory Usage of a connection

2012-08-30 Thread Peter Schuller
amounts of data? Large or many columns (or both), etc. Essentially all working data that your request touches is allocated on the heap and contributes to allocation rate and ParNew frequency. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: force gc?

2012-09-02 Thread Peter Schuller
is not as compact as PostgreSQL. For example column names are duplicated in each row, and the row key is duplicated twice (once in index, once in data). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: force gc?

2012-09-02 Thread Peter Schuller
I think that was clear from your post. I don't see a problem with your process. Setting gc grace to 0 and forcing compaction should indeed return you to the smallest possible on-disk size. (But may be unsafe as documented; can cause deleted data to pop back up, etc.) -- / Peter Schuller

Re: force gc?

2012-09-02 Thread Peter Schuller
over all rows: for row_id, row in your_column_family.get_range(): https://github.com/pycassa/pycassa -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

2012-09-06 Thread Peter Schuller
This problem is not new to 1.1. On Sep 6, 2012 5:51 AM, Radim Kolar h...@filez.com wrote: i would migrate to 1.0 because 1.1 is highly unstable.

Re: Cassandra 1.1.1 on Java 7

2012-09-08 Thread Peter Schuller
Has anyone tried running 1.1.1 on Java 7? Have been running jdk 1.7 on several clusters on 1.1 for a while now. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-10 Thread Peter Schuller
by inter-region pointers). If you can avoid that, one might hope to avoid full gc:s all-together. The jury is still out on my side; but like I said, I've seen promising indications. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller
question is how often. But given the lack of handling of such failure modes, the effect on clients is huge. Recommend data reads by default to mitigate this and a slew of other sources of problems (and for counter increments, we're rolling out least-active-request routing). -- / Peter Schuller

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller
it is in action. FWIW, J9's balanced collector is very similar to G1 in it's design. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-12 Thread Peter Schuller
Our full gc:s are typically not very frequent. Few days or even weeks in between, depending on cluster. *PER NODE* that is. On a cluster of hundreds of nodes, that's pretty often (and all it takes is a single node). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Changing bloom filter false positive ratio

2012-09-14 Thread Peter Schuller
sstable will effectively cover almost the entire range (since you're effectively spraying random tokens at it, unless clients are writing data in md5 order). (Maybe it's different for ordered partitioning though.) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-15 Thread Peter Schuller
the causes of un-predictable behavior w.r.t. GC by being careful about it's memory allocation and *retention* profile. For the specific case of avoiding *ever* seeing a full gc, it gets even more complex. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

2012-09-20 Thread Peter Schuller
the top of my head) the resulting value being correct is if the later increment (N2 in this case) is somehow including N1 as well (e.g., because it was generated by first reading the current counter value). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

2012-09-20 Thread Peter Schuller
be safely retried. Cassandra counters are generally not useful if *strict* correctness is desired, for this reason. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: JVM 7, Cass 1.1.1 and G1 garbage collector

2012-09-24 Thread Peter Schuller
with slightly changed workloads? It's very hard to blackbox-test GC settings, which is probably why GC tuning can be perceived as a useless game of whack-a-mole. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Why data tripled in size after repair?

2012-09-26 Thread Peter Schuller
a single sstable bigger than what would normally happen, and it takes more total disk space before it will be part of a compaction again. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Why data tripled in size after repair?

2012-10-01 Thread Peter Schuller
of cassandra? It's in the 1.1 branch; I don't remember if it went into a release yet. If not, it'll be in the next 1.1.x release. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: nodetool cleanup

2012-10-22 Thread Peter Schuller
On Oct 22, 2012 11:54 AM, B. Todd Burruss bto...@gmail.com wrote: does nodetool cleanup perform a major compaction in the process of removing unwanted data? No.

Re: Java 7 support?

2012-10-24 Thread Peter Schuller
FWIW, we're using openjdk7 on most of our clusters. For those where we are still on openjdk6, it's not because of an issue - just haven't gotten to rolling out the upgrade yet. We haven't had any issues that I recall with upgrading the JDK. -- / Peter Schuller (@scode, http

Re: Simulating a failed node

2012-10-28 Thread Peter Schuller
of a different story and if you want to test behavior when nodes go down I suggest including that. See CASSANDRA-2540 and CASSANDRA-3927.) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Read IO

2013-02-20 Thread Peter Schuller
settings (typically trading pollution of page cache vs. number of I/O:s). -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Multi-DC Deployment

2011-04-21 Thread Peter Schuller
to skip rows that are obviously bad, but true integrity checking is not supported at this time. -- / Peter Schuller

Re: decommissioning a wrong node

2011-04-24 Thread Peter Schuller
. the common case of wanting to listen on 127.0.0.1 but no public interfaces... -- / Peter Schuller

Re: Performance tests using stress testing tool

2011-04-28 Thread Peter Schuller
haywire as you don't service as many I/O requests as are coming in. There is a grey area in between where latency will be very sensitive to smallish changes in I/O load but aggregate throughput remaining below what can be sustained. -- / Peter Schuller

Re: Performance tests using stress testing tool

2011-04-28 Thread Peter Schuller
. -- / Peter Schuller

Re: OOM on heavy write load

2011-04-28 Thread Peter Schuller
usage during periods of timeouts. If the huge allocations fail due to fragmentation and fallback to Full GC that might be an expected result. Else -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps. -- / Peter Schuller

Re: OOM on heavy write load

2011-04-28 Thread Peter Schuller
issues. *Maybe* after two full gc:s tops if the first happens while there's a mix still active in memtables. -- / Peter Schuller

Re: Performance tests using stress testing tool

2011-04-29 Thread Peter Schuller
Thanks Peter. I am using java version of the stress testing tool from the contrib folder. Is there any issue that should be aware of? Do you recommend using pystress? I just saw Brandon file this: https://issues.apache.org/jira/browse/CASSANDRA-2578 Maybe that's it. -- / Peter Schuller

Re: Backup full cluster

2011-05-04 Thread Peter Schuller
the former row key gets restored to a point in time prior to that of the latter row key may cause the latter write to become visible even though the former write is lost. -- / Peter Schuller

Re: Backup full cluster

2011-05-04 Thread Peter Schuller
: Think before typing. :) -- / Peter Schuller

Re: Decommissioning node is causing broken pipe error

2011-05-05 Thread Peter Schuller
that the need for major compactions is significantly lessened or even eliminated. However, running major compactions won't cause tombstones *not* to be removed; it's just not required *in order* for them to be removed. -- / Peter Schuller

Re: compaction strategy

2011-05-07 Thread Peter Schuller
. -- / Peter Schuller

Re: compaction strategy

2011-05-07 Thread Peter Schuller
to the effects of the temporary spike in data size and cache coldness. Sounds like it makes good sense in your situation though. -- / Peter Schuller

Re: strange behaviour in cassandra

2011-05-08 Thread Peter Schuller
lead to a delay of sstable removal which will vary with whatever else is happening (the more busy the node, the more often a concurrent mark/sweep gc phase is triggered, and the more frequently obsolete sstables are deleted). -- / Peter Schuller

Re: Index interval tuning

2011-05-09 Thread Peter Schuller
to judge and likely depends a lot on i/o scheduling and other details. -- / Peter Schuller

Re: Index interval tuning

2011-05-10 Thread Peter Schuller
store? So the only thing I can do is test it and see how it goes. To make the change affective, should I do anything beyond changing the value in cassandra.yaml and restart the node? I'll try first with 256 and see what happens. That should be it. -- / Peter Schuller

Re: Finding big rows

2011-05-11 Thread Peter Schuller
What is the best way to find keys of such big rows? One, if not necessarily the best, way is to check system.log for large row warnings that trigger for rows large enough to be compacted lazily. Grep for 'azy' (or lazy case-insens) and you should find it. -- / Peter Schuller

Re: Commitlog Disk Full

2011-05-12 Thread Peter Schuller
setting in question is the memtable_flush_after setting. Do you have that set to something very high on one of your column families? You can use describe keyspace name_of_keyspace in cassandra-cli to check current settings. -- / Peter Schuller

Re: Commitlog Disk Full

2011-05-13 Thread Peter Schuller
? Including overwrites. If not, I'm not sure what's going on. Since you said it took about a day of traffic it feels fishy. -- / Peter Schuller

Re: Monitoring bytes read per cf

2011-05-13 Thread Peter Schuller
to be, due to the LRU:ishness of caches, the less frequently accessed data that tends to make it difficult to judge by numbers that include all I/O. -- / Peter Schuller

Re: Native heap leaks?

2011-05-15 Thread Peter Schuller
Ok, so I think I found one major cause contributing to the increasing Nice job tracking this down! That is useful to know, even outside of Cassandra use cases. Frankly it's disappointing to learn what nio is doing. -- / Peter Schuller

Re: Inconsistent data issues when running nodetool move.

2011-05-15 Thread Peter Schuller
is useful since it allows consistency semantics similar to ALL but allows you to survive nodes being down, at the cost of a higher RF (3 at least)) -- / Peter Schuller

Re: Cassandra and concurrent programming

2011-05-16 Thread Peter Schuller
/) for that. -- / Peter Schuller

Re: Gossiper question

2011-05-18 Thread Peter Schuller
/browse/CASSANDRA-2554 which may be relevant, but simple overload is also a possible reason. -- / Peter Schuller

Re: repair question

2011-05-23 Thread Peter Schuller
. Particularly in a situation with lots of dropped messages. I'm getting the 2^15 from AntiEntropyService.Validator.Validator() which passes a maxsize of 2^15 to the MerkelTree constructor. -- / Peter Schuller

Re: repair question

2011-05-24 Thread Peter Schuller
the more I think of it ;) -- / Peter Schuller

Re: repair question

2011-05-24 Thread Peter Schuller
Hmmm, I'm starting to like this idea more and more the more I think of it ;) Filed: https://issues.apache.org/jira/browse/CASSANDRA-2699 -- / Peter Schuller

Re: sync commitlog in batch mode lose data

2011-05-31 Thread Peter Schuller
and cassandra service 5). read the key list generated in step 2) with consistency level ONE How sure are you that the system is honoring fsync() properly, including flushing any caches on underlying drives? Or is this with battery backed caching RAID controllers? -- / Peter Schuller

Re: sync commitlog in batch mode lose data

2011-06-03 Thread Peter Schuller
will legitimiately get very low values without it indicating anything is wrong.) -- / Peter Schuller

Re: sync commitlog in batch mode lose data

2011-06-07 Thread Peter Schuller
kernels) barriers at the OS level - and the list goes on. -- / Peter Schuller

Re: Retrieving a column from a fat row vs retrieving a single row

2011-06-08 Thread Peter Schuller
adding a lot of overhead in terms of disk I/O unless your data set fits comfortably in memory. -- / Peter Schuller

Re: repair and amount of transfers

2011-06-14 Thread Peter Schuller
at the same time. -- / Peter Schuller

Re: Forcing Cassandra to free up some space

2011-06-15 Thread Peter Schuller
changes so I'm not sure off hand what'll happen to the auto-system.gc code in cassandra that attempts to free space. CASSANDRA-2521 is IMO the real solution. -- / Peter Schuller

Re: Force a node to form part of quorum

2011-06-16 Thread Peter Schuller
It would be great if Cassandra puts this on their roadmap. There is lot of durability benefits by incorporating dc awareness into the write consistency equation. You may be interested in the discussion here: https://issues.apache.org/jira/browse/CASSANDRA-2338 -- / Peter Schuller

Re: Pruning commit logs manually

2011-06-17 Thread Peter Schuller
a memtable is flushed can the commit log which contains the data being flushed, be removed by Cassandra. -- / Peter Schuller

Re: CommitLog replay

2011-06-21 Thread Peter Schuller
when it starts up the thrift interface - check system.log. -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller
benefits from this. The only hard requirement is the repair schedule relative to GC grace time, and that requirement does not change - just be mindful of the timing of the EBS snapshots and what that means to your repair schedule. -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller
EBS volume atomicity is good. We've had tons of experience since EBS came out almost 4 years ago,  to back all kinds of things, including large DBs. And thanks a lot for coming forward with production experience. That is always useful with these things. -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller
about freeze maybe being probabilistically useful anyway. -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller
this in detail. I should really start working off the backlog of those blog entries... -- / Peter Schuller

Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?

2011-06-23 Thread Peter Schuller
certain goals, including crash consistency. It still relies on the same fundamental properties of the underlying storage device. -- / Peter Schuller

Re: Cassandra ACID

2011-06-24 Thread Peter Schuller
choosing batch commit log sync instead of periodic if single-node durability or post-quorum-write durability is a concern. -- / Peter Schuller

Re: AntiEntropy?

2011-07-11 Thread Peter Schuller
-- / Peter Schuller

Re: Node repair questions

2011-07-11 Thread Peter Schuller
/networking load and is not yet rate limited like compaction. In addition be aware that repair can cause disk space usage to temporarily increase if there are significant differences to be repaired. -- / Peter Schuller

Re: Node repair questions

2011-07-11 Thread Peter Schuller
to be. -- / Peter Schuller

Re: Re: AntiEntropy?

2011-07-12 Thread Peter Schuller
must be scheduled by the operator to run regularly. The name repair is a bit unfortunate; it is not meant to imply that it only needs to run when something is wrong. -- / Peter Schuller

Re: Meaning of 'nodetool repair has to run within GCGraceSeconds'

2011-07-12 Thread Peter Schuller
new sstables) which is expensive for the usual reasons with disk I/O; it's major since it covers all data. The data read is in fact used to calculate a merkle tree for comparison with neighbors, as claimed. -- / Peter Schuller

Re: Anyone using Facebook's flashcache?

2011-07-12 Thread Peter Schuller
and your memory is enough to keep the hot set, and you're disk I/O is coming form the long tail, increasing the amount of cache to 200 gig may not necessarily give you a huge improvement in terms of percentages. -- / Peter Schuller

Re: Anyone using Facebook's flashcache?

2011-07-12 Thread Peter Schuller
be 10 times that of the original cache. I did a quick Google but didn't find a good piece describing it more properly, but hopefully the above is helpful. Some related reading might be http://en.wikipedia.org/wiki/Long_Tail -- / Peter Schuller

Re: Re: Re: AntiEntropy?

2011-07-12 Thread Peter Schuller
documented in the link I sent before, unless you have specific reasons not to and know what you're doing. -- / Peter Schuller

Re: Re: Re: Re: AntiEntropy?

2011-07-13 Thread Peter Schuller
repair take? So basically, leave significant margin. -- / Peter Schuller

Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller
dependent on some marker that isn't written until commit log synch.) -- / Peter Schuller (@scode on twitter)

Re: commitlog replay missing data

2011-07-13 Thread Peter Schuller
# wait for a bit until no one is sending it writes anymore More accurately, until all other nodes have realized it's down (nodetool ring on each respective host). -- / Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

2011-07-13 Thread Peter Schuller
? -- / Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller
to increase RF. If you *really* know what you're doing and why you want RF to track total node count, I'm sure there are *some* cases where this makes sense. But nothing you've said so far really indicates you're in such a position. -- / Peter Schuller (@scode on twitter)

Re: JNA to avoid swap but physical memory increase

2011-07-15 Thread Peter Schuller
really have permission to do the mlockall()? (Not that I disagree in any way that swap should be disabled, +1 on that.) -- / Peter Schuller (@scode on twitter)

Re: Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Peter Schuller
. -- / Peter Schuller (@scode on twitter)

Re: Cache layer in front of cassandra... any help / suggestions?

2011-07-15 Thread Peter Schuller
checking out: https://issues.apache.org/jira/browse/CASSANDRA-1283 https://issues.apache.org/jira/browse/CASSANDRA-1969 If it can be done via that the nice thing is that you don't loose consistency. -- / Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller
of consistency level *never* affects *which* nodes are responsible for a given row key, nor does it affect which rows will eventually receive writes. It *only* affects how many nodes must respond before the operation (read or write) is considered successful. Does that make it clearer? -- / Peter

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller
will be serving the requests. -- / Peter Schuller (@scode on twitter)

Re: Replicating to all nodes

2011-07-15 Thread Peter Schuller
. It is not the case that having RF be equal to the cluster size is in and of itself a useful property. -- / Peter Schuller (@scode on twitter)

Re: How are column sort handled?

2011-07-18 Thread Peter Schuller
decrease read performance, as the average row can become more spread out over multiple sstables. This is one potential driver for compaction. -- / Peter Schuller (@scode on twitter)

Re: How are column sort handled?

2011-07-18 Thread Peter Schuller
Thanks! Then does it mean that before compaction if read call comes for that key sort is done at the read time since column b, c and a are in different ssTables. Essentially yes; a merge-sort happens (since they are sorted locally in each sstable). -- / Peter Schuller (@scode on twitter)

Re: host clocks

2011-07-25 Thread Peter Schuller
with respect to.) Clocks should be synchronized, yes. But either your data model is such that conflicting writes are okay, or you need external co-ordination. There's not hoping for the best by keeping clocks better in synch. -- / Peter Schuller (@scode on twitter)

Re: Predictable low RW latency, SLABS and STW GC

2011-07-26 Thread Peter Schuller
effects on caches like streaming through all the bf and index files, restarts are certainly detrimental to the page cache. Also you may still see some eviction (even if it doesn't *necessarily* happen) depending (particularly if not running with numactl set to interleave). -- / Peter Schuller

Re: Too many open files

2011-07-27 Thread Peter Schuller
nodes that run into this. -- / Peter Schuller (@scode on twitter)

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Peter Schuller
construct a benchmark where there's no difference, yet see a significant difference in a real-world scenario when your benchmarked I/O is intermixed with other I/O. Not to mention subtle differences in behaviors of kernels, RAID controllers, disk drive controllers, etc... -- / Peter Schuller (@scode

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-29 Thread Peter Schuller
that cost is not specific to mmap():ed files (once a given page is in core that is). (But again, I'm not arguing the point in Cassandra's case; just generally.) -- / Peter Schuller (@scode on twitter)

Re: Question about eventually consistent

2011-07-31 Thread Peter Schuller
-ordination outside of Cassandra, there will be the potential for multiple clients reading and writing without awareness of each other. Whatever the behavior is, your data model must be such that this is acceptable. -- / Peter Schuller (@scode on twitter)

Re: Unable to repair a node

2011-08-14 Thread Peter Schuller
that ReadStage is usually full (@ your limit)? -- / Peter Schuller (@scode on twitter)

Re: Unable to repair a node

2011-08-14 Thread Peter Schuller
with RF=3, then for whatever ranges of the ring that node was partially responsible for, only 2 of the 3 copies will be up/available. They do not automatically migrate somewhere else. -- / Peter Schuller (@scode on twitter)

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-19 Thread Peter Schuller
The compactions ettings do not affect repair. (Thinking out loud, or does it ? Validation compactions and table builds.) It does. -- / Peter Schuller (@scode on twitter)

Re: Nodetool repair takes 4+ hours for about 10G data

2011-08-19 Thread Peter Schuller
filters and indexes. Can be CPU or I/O bound (or throttled) - nodetool compactionstats, htop, iostat -x -k 1 -- / Peter Schuller (@scode on twitter)

Re: Urgent:!! Re: Need to maintenance on a cassandra node, are there problems with this process

2011-08-19 Thread Peter Schuller
pasted; periodic repairs are part of regular cluster maintenance. See: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair -- / Peter Schuller (@scode on twitter)

Re: nodetool repair caused high disk space usage

2011-08-19 Thread Peter Schuller
netstats. If there are stuck streams, they might be causing sstable retention beyond what you'd expect. -- / Peter Schuller (@scode on twitter)

Re: How can I patch a single issue

2011-08-19 Thread Peter Schuller
rebasing is necessary. You might try a trunk from further back in time (around the time Stu submitted the patch). I'm not quite sure what you're actual problem is though, if it's source code access then the easiest route is probably to check it out from https://github.com/apache/cassandra -- / Peter

Re: Occasionally getting old data back with ConsistencyLevel.ALL

2011-08-19 Thread Peter Schuller
) that indicates the items you specifically say are processed twice, were in fact written twice to Cassandra? -- / Peter Schuller (@scode on twitter)

Re: Unable to repair a node

2011-08-19 Thread Peter Schuller
is if there is a very small amount of data (yet non-zero), or significant amounts. -- / Peter Schuller (@scode on twitter)

  1   2   3   4   5   6   >